predict-idlab / tsflex

Flexible time series feature extraction & processing
https://predict-idlab.github.io/tsflex/
MIT License
396 stars 26 forks source link

Stuck on calculating features #129

Closed cameron-hobbs closed 1 week ago

cameron-hobbs commented 1 month ago

cam@DESKTOP-41QNH42:/mnt/c/Users/ahobb/PycharmProjects/cammm$ poetry run python cammm/feature_eng/extract.py 99%|████████████████████████████████████████████████████████████████████████████████ | 273/276 [00:19<00:00, 98.37it/s]

Hi, calculating features is getting fully stuck (I think on acquiring a lock), it was running fine before with the exact same data and exact same code but I keyboard interrupted now it gets stuck whenever I try to run again. Here is some of the code:

    df = pd.read_csv(path)
    df["receipt_timestamp"] = pd.to_datetime(df["receipt_timestamp"], unit="s")
    df = df.set_index("receipt_timestamp")
    df = df[feature_cols]

    fc = FeatureCollection(
        [
            MultipleFeatureDescriptors(
                functions=catch22_wrapper(catch22_all),
                series_names=feature_cols,
                windows=["3s", "5s", "10s", "30s", "60s"],
                strides="1s"
            ),
            MultipleFeatureDescriptors(
                functions=tsfresh_settings_wrapper(MinimalFCParameters()),
                series_names=feature_cols,
                windows=["3s", "5s", "10s", "30s", "60s"],
                strides="1s"
            ),
            MultipleFeatureDescriptors(
                functions=[last_value],
                series_names=["mid"],
                windows=["3s"],
                strides="1s"
            )
        ]
    )

    feature_data = fc.calculate(df, return_df=True, approve_sparsity=True, show_progress=True)
cameron-hobbs commented 1 month ago

I tried restarting my pc etc, issue persists

jvdd commented 1 month ago

Hey @cameron-hobbs,

Thanks for submitting this issue!

I experienced this as well when computing features with n_jobs > 0 (which is default None -> # logical cores).

I tried investigating this issue in the past, but did not manage to get a consistent (minimal) reproducible example. If you can share one with me, I'll gladly look further into this.

cameron-hobbs commented 1 month ago

Hi @jvdd thanks for your reply. Do you know how you got it working again on your machine?

I also tried n_jobs=1 but then I get a segmentation fault, unsure if related or not

cameron-hobbs commented 1 month ago

Hmm i tried some more tonight and its still not working

Is there a discord or a community i can ask more questions in?

I am also curious how i could include an independent target variable with calculated features?

jvdd commented 3 weeks ago

Hi @cameron-hobbs,

I've updated the dill and multiprocess dependencies, as the error you're encountering likely originates from issues with these libraries. The problem might be resolved in the newer versions of these packages.

I'll continue to investigate and work on creating a reproducible example to better understand the issue.

Regarding your other question: We didn't have a Discord server before, but I believe it could be a valuable way to connect with the community. I've set one up, and you're more than welcome to join. Feel free to share your findings, ideas, or any feedback there. It could also be a great space to showcase what people are achieving with our toolkits!

Discord: https://discord.gg/4WcUHFNe

Cheers, Jeroen

cameron-hobbs commented 2 weeks ago

Just confirming this is resolved from my side with the latest dependency updates (as discussed on discord) - thanks!

jvdd commented 2 weeks ago

Great! Can you check if the following is still an issue - if so, I'll glady look into this as well :)

I also tried n_jobs=1 but then I get a segmentation fault, unsure if related or not

cameron-hobbs commented 2 weeks ago

Just tried now, n_jobs=1 is also working

jonasvdd commented 1 week ago

@cameron-hobbs, should now be officially fixed in the tsflex==0.4.1 release.