Different results given by same input in dataframe and numpy array form

laszukdawid / PyEMD

Python implementation of Empirical Mode Decompoisition (EMD) method

https://pyemd.readthedocs.io/

Apache License 2.0

867 stars 224 forks source link

Different results given by same input in dataframe and numpy array form #62

Closed henrychow94 closed 4 years ago

henrychow94 commented 4 years ago

the variable 'vai' is a dataframe with one column 'sa' and I just found the same input in dataframe and numpy array form gives different results here's the code:

IMF = EMD().emd(vai['sa'].values, max_imf=3) IMF1 = EMD().emd(vai['sa'], max_imf=3)

and IMF[-1] is quite different from IMF1[-1]

laszukdawid commented 4 years ago

Hey @henrychow94 In short, PyEMD wasn't tested on pandas and there's no guarantee regarding its results. I'm not sure how exactly iteration would go through the dataframe/series.

In case the rest of IMFs are looking similar, the issue might be in the type casting. Since EMD does a large number iterative subtractions any small difference will be quickly inflated. Differences on the machine epsilon are already problematic.

laszukdawid commented 4 years ago

To help understand whether there is an issue please: 1) Describe the rest of IMFs. Are they different? 2) Add same offset (or signal) to numpy and pandas series and see if the issue persists. 3) Force PyEMD to use specific type by setting emd.DTYPE = np.float32

henrychow94 commented 4 years ago

Thank you @laszukdawid

Here's to your question:

All the IMFs are quite different. (This I'll illustrate by pics)
I'm not sure what do you mean by 'Add same offset to np and pd'?
by using emd.DTYPE = np.float32 , the results are the same, the problem persists.

Here's my code: from PyEMD import EMD emd = EMD() emd.DTYPE = np.float32

IMF = emd(vai['sa'].values,max_imf=3) IMF1 = emd(vai['sa'],max_imf=3)

for i in [-1,-2,-3,-4]: pd.DataFrame(index=vai.index,data=np.concatenate((IMF[i].reshape(-1,1),IMF1[i].reshape(-1,1)),axis=1),columns=['np','pd']).plot()

and the plots compare different results given by this two methods(np and pd) in IMF[i]

i = -1
i = -2
i = -3
i = -4

I think the results are even better when I put the data in the dataframe form, because I'm trying to extract trend from a time-series. And below is a comparison of trend(the last IMF) against original series

using dataframe(better)
using numpy array(worse)

laszukdawid commented 4 years ago

Thanks for highlighting this. Ticket was closed too soon. I'll investigate this in coming weeks.

laszukdawid commented 4 years ago

Hey,

Sorry that it took me so long to get to this. I've been checking pandas with PyEMD and, in all honesty, I'm not sure how you managed to get the results you have. Looking at the PyEMD's code and knowing about Pandas dataframes/series I'm actually surprised you got any results as it shouldn't work at all. For example, take a look at this line, i.e.

indzer = np.nonzero(S[1:]*S[:-1]<0)[0]

which checks the sign change. In case of panda's time series the comparison is done based on indexes, which means that S[1:]*S[:-1] is roughly S[1:-1]**2 so everything else fails.

I'm curious of how you obtained what you're presenting. Any chance you could send over a jupyter notebook with some data attached, and listing Numpy and Panda's versions?

Really appreciate, thanks.

henrychow94 commented 4 years ago

Thank you Dawid @laszukdawid

I checked what you said, and I think it depends on the version of pandas.

When I run emd(vai['sa'],max_imf=3) on pandas version 0.25.1, it works though it gives a warning like this 1584436147(1)

But when I run the same code on pandas version 1.0.1 it fails.

And what's even more strange is that the presions of the results given by these two methods(run with pd.Series or np.array) are quite different, with the latter being worse on presion. I assume if this is happening because the default max iterarion times are different under such two circumstances.

Again, thank you for your effort on this issue!

laszukdawid commented 4 years ago

Thanks for letting me know and great that it works :)