laszukdawid / PyEMD

Python implementation of Empirical Mode Decompoisition (EMD) method
https://pyemd.readthedocs.io/
Apache License 2.0
857 stars 222 forks source link

Parallel Function not working in EEMD #112

Closed Saul-the-engineer closed 2 years ago

Saul-the-engineer commented 2 years ago

I'm trying to run the EEMD function on environmental data. The function works when parallel is set to False and processes is set to None. However, when I turn parallel to True, and set processors to any integer, I get an error. I'd really like be able to get this feature working because I need to speed up the processing .

`

#pandas dataframe
data_temp 

if __name__ == "__main__":

 eemd = EEMD(trials= 100, 
                        noise_width = 0.05, 
                        ext_EMD=None, 
                        separate_trends= True,
                        DTYPE = np.float16,
                        spline_kind='akima',
                        parallel = True,
                        processes = 1)

   eemd.noise_seed(42)
   eIMF = eemd(data_temp[var].values).T #Transverse to match with index
eIMF = pd.DataFrame(eIMF, index=data_temp.index)

` When I run this code, it seems like the decomposition gets skipped and tries to put the data into a dataframe, yet the dataframe hasn't been defined. I've attached a screenshot of the error.

image

I'm running emd-signal version 0.2.15 with Anaconda, on Windows using a Ryzen 7 CPU.

I'd appreciate any help.

laszukdawid commented 2 years ago

Hey,

I'm honestly struggling a bit to understand your code and what did you do. The error you're sharing tells that eIMF is not defined which means that your code doesn't go through lines with eIMF assignment. This suggest Python workflow error rather than anything with PyEMD.

Could you please check the code you shared and make sure it is exactly formatted as what you're running. If you can't for whatever reason, maybe an image?

Saul-the-engineer commented 2 years ago

Sure, thanks for getting back to me I'll share a longer piece of code. What this is trying to do is load a pickle file which is a dictionary containing many pandas dataframes, or cells. Within each cell, I have 28 timeseries variables which I'd like to process with EEMD and append the columns to the original dataframe for the cell- hence two for loops. When I attempt to run the following code, I get an error. However, when I change parallel to 'False' and change processors to 'None' and the script works.

` import numpy as np import pandas as pd import utils_data_augmentation from PyEMD import EEMD #pip install EMD-signal

data_root ="./Datasets/" figures_root = './Figures EEMD'

DA = utils_data_augmentation.Data_Augmentation(data_root, figures_root)

Data = DA.read_pickle('GLDAS_Data', data_root) cell_names = list(Data.keys()) cell_names.remove('Location')

for i, cell in enumerate(cell_names): data_temp = Data[cell]

for j, var in enumerate(data_temp.columns):
    if __name__ == "__main__":
        eemd = EEMD(trials= 100, 
                    noise_width = 0.05, 
                    ext_EMD=None, 
                    separate_trends= True,
                    DTYPE = np.float16,
                    spline_kind='akima',
                    parallel = True,
                    processes = 1)
        eemd.noise_seed(42)
        eIMF = eemd(data_temp[var].values).T

    out = pd.DataFrame(eIMF, index=data_temp.index)
    label = [var + '_imf_' + str(k+1) for k in range(len(out.columns))]
    out.columns = label

`

Here is part of the error before it starts repeating itself.

File "c:\users\saulg\onedrive\dissertation\well imputation\master code\05_eemd_feature_creation - gldas - copy - copy.py", line 48, in eIMF = eemd(data_temp[var].values).T #Transverse to match with index NameError: name 'eemd' is not defined Traceback (most recent call last): File "", line 1, in File "C:\Users\saulg\Anaconda3\envs\Deep_Learning_Env\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\saulg\Anaconda3\envs\Deep_Learning_Env\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\saulg\Anaconda3\envs\Deep_Learning_Env\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\saulg\Anaconda3\envs\Deep_Learning_Env\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\saulg\Anaconda3\envs\Deep_Learning_Env\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\saulg\Anaconda3\envs\Deep_Learning_Env\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\saulg\Anaconda3\envs\Deep_Learning_Env\lib\runpy.py", line 87, in _run_code exec(code, run_globals)

laszukdawid commented 2 years ago

Hey Saul,

Unfortunately I need to again ask to you properly format the code in your answer so that we can actually find the reason for the issue. Please check this guide https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code.

One thing that stands out is that you have if __name__ == "__main__": in a for loop. That if statement is mainly used in the top level (outside for); it also would make sens that you don't see eIMF because it never reaches eemd(...).

Try

for j, var in enumerate(data_temp.columns):
    eemd = EEMD(trials= 100, 
                noise_width = 0.05, 
                ext_EMD=None, 
                separate_trends= True,
                DTYPE = np.float16,
                spline_kind='akima',
                parallel = True,
                processes = 1)
    eemd.noise_seed(42)
    eIMF = eemd(data_temp[var].values).T

    out = pd.DataFrame(eIMF, index=data_temp.index)
    label = [var + '_imf_' + str(k+1) for k in range(len(out.columns))]
    out.columns = label
Saul-the-engineer commented 2 years ago

It looks like I figured it out. If I'm trying to run the eemd script in parallel within a for loop on windows, all the subsequent actions need to be inside of the if statement.

Thank you so much for the help, this shortened the runtime from 118 seconds to 14 seconds.

for j, var in enumerate(data_temp.columns):
        if __name__ == '__main__':
            eemd = EEMD(trials= 1000, 
                    noise_width = 0.05, 
                    ext_EMD=None, 
                    separate_trends= True,
                    DTYPE = np.float16,
                    spline_kind='akima',
                    parallel = True,
                    processes = 2)
            eemd.noise_seed(42)
            eIMF = eemd(data_temp[var].values).T
            out = pd.DataFrame(eIMF, index=data_temp.index)
            label = [var + '_imf_' + str(k+1) for k in range(len(out.columns))]
            out.columns = label
laszukdawid commented 2 years ago

Great to hear that :)