Closed Saul-the-engineer closed 2 years ago
Hey,
I'm honestly struggling a bit to understand your code and what did you do. The error you're sharing tells that eIMF is not defined
which means that your code doesn't go through lines with eIMF
assignment. This suggest Python workflow error rather than anything with PyEMD.
Could you please check the code you shared and make sure it is exactly formatted as what you're running. If you can't for whatever reason, maybe an image?
Sure, thanks for getting back to me I'll share a longer piece of code. What this is trying to do is load a pickle file which is a dictionary containing many pandas dataframes, or cells. Within each cell, I have 28 timeseries variables which I'd like to process with EEMD and append the columns to the original dataframe for the cell- hence two for loops. When I attempt to run the following code, I get an error. However, when I change parallel to 'False' and change processors to 'None' and the script works.
` import numpy as np import pandas as pd import utils_data_augmentation from PyEMD import EEMD #pip install EMD-signal
data_root ="./Datasets/" figures_root = './Figures EEMD'
DA = utils_data_augmentation.Data_Augmentation(data_root, figures_root)
Data = DA.read_pickle('GLDAS_Data', data_root) cell_names = list(Data.keys()) cell_names.remove('Location')
for i, cell in enumerate(cell_names): data_temp = Data[cell]
for j, var in enumerate(data_temp.columns):
if __name__ == "__main__":
eemd = EEMD(trials= 100,
noise_width = 0.05,
ext_EMD=None,
separate_trends= True,
DTYPE = np.float16,
spline_kind='akima',
parallel = True,
processes = 1)
eemd.noise_seed(42)
eIMF = eemd(data_temp[var].values).T
out = pd.DataFrame(eIMF, index=data_temp.index)
label = [var + '_imf_' + str(k+1) for k in range(len(out.columns))]
out.columns = label
`
Here is part of the error before it starts repeating itself.
File "c:\users\saulg\onedrive\dissertation\well imputation\master code\05_eemd_feature_creation - gldas - copy - copy.py", line 48, in
Hey Saul,
Unfortunately I need to again ask to you properly format the code in your answer so that we can actually find the reason for the issue. Please check this guide https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code.
One thing that stands out is that you have if __name__ == "__main__":
in a for
loop. That if
statement is mainly used in the top level (outside for
); it also would make sens that you don't see eIMF
because it never reaches eemd(...)
.
Try
for j, var in enumerate(data_temp.columns):
eemd = EEMD(trials= 100,
noise_width = 0.05,
ext_EMD=None,
separate_trends= True,
DTYPE = np.float16,
spline_kind='akima',
parallel = True,
processes = 1)
eemd.noise_seed(42)
eIMF = eemd(data_temp[var].values).T
out = pd.DataFrame(eIMF, index=data_temp.index)
label = [var + '_imf_' + str(k+1) for k in range(len(out.columns))]
out.columns = label
It looks like I figured it out. If I'm trying to run the eemd script in parallel within a for loop on windows, all the subsequent actions need to be inside of the if statement.
Thank you so much for the help, this shortened the runtime from 118 seconds to 14 seconds.
for j, var in enumerate(data_temp.columns):
if __name__ == '__main__':
eemd = EEMD(trials= 1000,
noise_width = 0.05,
ext_EMD=None,
separate_trends= True,
DTYPE = np.float16,
spline_kind='akima',
parallel = True,
processes = 2)
eemd.noise_seed(42)
eIMF = eemd(data_temp[var].values).T
out = pd.DataFrame(eIMF, index=data_temp.index)
label = [var + '_imf_' + str(k+1) for k in range(len(out.columns))]
out.columns = label
Great to hear that :)
I'm trying to run the EEMD function on environmental data. The function works when parallel is set to False and processes is set to None. However, when I turn parallel to True, and set processors to any integer, I get an error. I'd really like be able to get this feature working because I need to speed up the processing .
`
` When I run this code, it seems like the decomposition gets skipped and tries to put the data into a dataframe, yet the dataframe hasn't been defined. I've attached a screenshot of the error.
I'm running emd-signal version 0.2.15 with Anaconda, on Windows using a Ryzen 7 CPU.
I'd appreciate any help.