Closed henrychow94 closed 4 years ago
Hey @henrychow94 In short, PyEMD wasn't tested on pandas and there's no guarantee regarding its results. I'm not sure how exactly iteration would go through the dataframe/series.
In case the rest of IMFs are looking similar, the issue might be in the type casting. Since EMD does a large number iterative subtractions any small difference will be quickly inflated. Differences on the machine epsilon are already problematic.
To help understand whether there is an issue please:
1) Describe the rest of IMFs. Are they different?
2) Add same offset (or signal) to numpy and pandas series and see if the issue persists.
3) Force PyEMD to use specific type by setting
emd.DTYPE = np.float32
Thank you @laszukdawid
Here's to your question:
Here's my code: from PyEMD import EMD emd = EMD() emd.DTYPE = np.float32
IMF = emd(vai['sa'].values,max_imf=3) IMF1 = emd(vai['sa'],max_imf=3)
for i in [-1,-2,-3,-4]: pd.DataFrame(index=vai.index,data=np.concatenate((IMF[i].reshape(-1,1),IMF1[i].reshape(-1,1)),axis=1),columns=['np','pd']).plot()
and the plots compare different results given by this two methods(np and pd) in IMF[i]
i = -1
i = -2
i = -3
i = -4
I think the results are even better when I put the data in the dataframe form, because I'm trying to extract trend from a time-series. And below is a comparison of trend(the last IMF) against original series
using dataframe(better)
using numpy array(worse)
Thanks for highlighting this. Ticket was closed too soon. I'll investigate this in coming weeks.
Hey,
Sorry that it took me so long to get to this. I've been checking pandas with PyEMD and, in all honesty, I'm not sure how you managed to get the results you have. Looking at the PyEMD's code and knowing about Pandas dataframes/series I'm actually surprised you got any results as it shouldn't work at all. For example, take a look at this line, i.e.
indzer = np.nonzero(S[1:]*S[:-1]<0)[0]
which checks the sign change. In case of panda's time series the comparison is done based on indexes, which means that S[1:]*S[:-1]
is roughly S[1:-1]**2
so everything else fails.
I'm curious of how you obtained what you're presenting. Any chance you could send over a jupyter notebook with some data attached, and listing Numpy and Panda's versions?
Really appreciate, thanks.
Thank you Dawid @laszukdawid
I checked what you said, and I think it depends on the version of pandas.
When I run emd(vai['sa'],max_imf=3) on pandas version 0.25.1, it works though it gives a warning like this
But when I run the same code on pandas version 1.0.1 it fails.
And what's even more strange is that the presions of the results given by these two methods(run with pd.Series or np.array) are quite different, with the latter being worse on presion. I assume if this is happening because the default max iterarion times are different under such two circumstances.
Again, thank you for your effort on this issue!
Thanks for letting me know and great that it works :)
the variable 'vai' is a dataframe with one column 'sa' and I just found the same input in dataframe and numpy array form gives different results here's the code:
IMF = EMD().emd(vai['sa'].values, max_imf=3) IMF1 = EMD().emd(vai['sa'], max_imf=3)
and IMF[-1] is quite different from IMF1[-1]