laszukdawid / PyEMD

Python implementation of Empirical Mode Decompoisition (EMD) method
https://pyemd.readthedocs.io/
Apache License 2.0
864 stars 224 forks source link

OSError: [Errno 12] Cannot allocate memory #57

Closed wunderbarr closed 5 years ago

wunderbarr commented 5 years ago

Hello, Recently I encounter such OSError:

/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1
OMP: Warning #190: Forking a process while a parallel region is active is potentially unsafe.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/root/anaconda3/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/root/anaconda3/lib/python3.7/multiprocessing/pool.py", line 412, in _handle_workers
    pool._maintain_pool()
  File "/root/anaconda3/lib/python3.7/multiprocessing/pool.py", line 248, in _maintain_pool
    self._repopulate_pool()
  File "/root/anaconda3/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/root/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/root/anaconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/root/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/root/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

And for some signals which are steady, I always encounter the following warning:

/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:670: RuntimeWarning: invalid value encountered in subtract
  tmp = S - np.sum(IMF, axis=0)
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:851: RuntimeWarning: invalid value encountered in subtract
  self.residue = residue = S - np.sum(IMF,axis=0)
/root/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py:1273: RuntimeWarning: invalid value encountered in subtract
  a = op(a[slice1], a[slice2])
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:599: RuntimeWarning: invalid value encountered in less
  indmin = np.nonzero(np.r_[d1*d2<0] & np.r_[d1<0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in less
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:670: RuntimeWarning: invalid value encountered in subtract
  tmp = S - np.sum(IMF, axis=0)
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:851: RuntimeWarning: invalid value encountered in subtract
  self.residue = residue = S - np.sum(IMF,axis=0)
/root/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py:1273: RuntimeWarning: invalid value encountered in subtract
  a = op(a[slice1], a[slice2])
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:599: RuntimeWarning: invalid value encountered in less
  indmin = np.nonzero(np.r_[d1*d2<0] & np.r_[d1<0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in less
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:670: RuntimeWarning: invalid value encountered in subtract
  tmp = S - np.sum(IMF, axis=0)
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:851: RuntimeWarning: invalid value encountered in subtract
  self.residue = residue = S - np.sum(IMF,axis=0)
/root/anaconda3/lib/python3.7/site-packages/PyEMD/CEEMDAN.py:192: RuntimeWarning: invalid value encountered in subtract
  prev_res = S - last_imf
/root/anaconda3/lib/python3.7/site-packages/PyEMD/CEEMDAN.py:259: RuntimeWarning: invalid value encountered in subtract
  R = S - np.sum(cIMFs, axis=0)
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:582: RuntimeWarning: invalid value encountered in less
  indzer = np.nonzero(S1*S2<0)[0]
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:599: RuntimeWarning: invalid value encountered in less
  indmin = np.nonzero(np.r_[d1*d2<0] & np.r_[d1<0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in less
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1
/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater
  indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1

And the EMD seems as an endless process for these signals. Is there any suggestion or best practice for processing these signals? Thank you!

laszukdawid commented 5 years ago

Interesting. Could you share info on what object are you passing? Numpy array? Also, amr chance you could share the exact time series? If not through GitHub then maybe mail it directly to me.

On Mon, Jul 8, 2019, at 18:38, wunderbarr wrote:

Hello, Recently I encounter such OSError:

/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1 OMP: Warning #190: Forking a process while a parallel region is active is potentially unsafe. Exception in thread Thread-1: Traceback (most recent call last): File "/root/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/root/anaconda3/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "/root/anaconda3/lib/python3.7/multiprocessing/pool.py", line 412, in _handle_workers pool._maintain_pool() File "/root/anaconda3/lib/python3.7/multiprocessing/pool.py", line 248, in _maintain_pool self._repopulate_pool() File "/root/anaconda3/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool w.start() File "/root/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/root/anaconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/root/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/root/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory And for some signals which are steady, I always encounter the following warning:

/root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:670: RuntimeWarning: invalid value encountered in subtract tmp = S - np.sum(IMF, axis=0) /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:851: RuntimeWarning: invalid value encountered in subtract self.residue = residue = S - np.sum(IMF,axis=0) /root/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py:1273: RuntimeWarning: invalid value encountered in subtract a = op(a[slice1], a[slice2]) /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:599: RuntimeWarning: invalid value encountered in less indmin = np.nonzero(np.r_[d1*d2<0] & np.r_[d1<0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in less indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:670: RuntimeWarning: invalid value encountered in subtract tmp = S - np.sum(IMF, axis=0) /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:851: RuntimeWarning: invalid value encountered in subtract self.residue = residue = S - np.sum(IMF,axis=0) /root/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py:1273: RuntimeWarning: invalid value encountered in subtract a = op(a[slice1], a[slice2]) /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:599: RuntimeWarning: invalid value encountered in less indmin = np.nonzero(np.r_[d1*d2<0] & np.r_[d1<0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in less indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:670: RuntimeWarning: invalid value encountered in subtract tmp = S - np.sum(IMF, axis=0) /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:851: RuntimeWarning: invalid value encountered in subtract self.residue = residue = S - np.sum(IMF,axis=0) /root/anaconda3/lib/python3.7/site-packages/PyEMD/CEEMDAN.py:192: RuntimeWarning: invalid value encountered in subtract prev_res = S - last_imf /root/anaconda3/lib/python3.7/site-packages/PyEMD/CEEMDAN.py:259: RuntimeWarning: invalid value encountered in subtract R = S - np.sum(cIMFs, axis=0) /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:582: RuntimeWarning: invalid value encountered in less indzer = np.nonzero(S1*S2<0)[0] /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:599: RuntimeWarning: invalid value encountered in less indmin = np.nonzero(np.r_[d1*d2<0] & np.r_[d1<0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in less indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1 /root/anaconda3/lib/python3.7/site-packages/PyEMD/EMD.py:600: RuntimeWarning: invalid value encountered in greater indmax = np.nonzero(np.r_[d1*d2<0] & np.r_[d1>0])[0]+1

And the EMD seems as an endless process for these signals. Is there any suggestion or best practice for processing these signals? Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/laszukdawid/PyEMD/issues/57?email_source=notifications&email_token=ACXNLKYVPO4KPTTOW2VUB4DP6PTXFA5CNFSM4H7AYCHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G57MMMA, or mute the thread https://github.com/notifications/unsubscribe-auth/ACXNLKZXGCFKZKDF47FSDALP6PTXFANCNFSM4H7AYCHA.

wunderbarr commented 5 years ago

The original file is a csv file containing time series data. Because it is big, I read it in chunks provided by pandas. Then each column data would be packed as an numpy array. I think it is difficult to send you original data files, because these are internal data in enterprise. I would discuss with my supervisor about that. Maybe I can generate similar fake data or visualize these data to you.

wunderbarr commented 5 years ago

Hello, we investigate the data which throws such warning and the above error. Because we are doing anomaly detection. Usually this variable should be sinusoid wave. But when the fault occurs, it turns to square wave and changes very slow. Hence, I think it would be better to add a detect procedure, and judge whether the signal is worthy to perform EMD. Because I am not very familiar with EMD theory, is there any suggestion to add the detection procedure in a general way? In other words, is there any formal theory about the general description of the signal which is not suitable for EMD/CEEMD? Thank you!

laszukdawid commented 5 years ago

Regarding theory:

No, EMD is as subjective as can be; it's based on algorithm. There are/were many attempts to put some theory behind it (including my work) but I'm yet to find something worth while. Having spent a while working on EMD I'm seriously sceptical whether it even make sense to use it anywhere.

Regarding the issue:

Since the error says Cannot allocate memory it's cause is most likely that you're handling in memory too large data. The way you handle csv with pandas confirms this (mostly). I wanted to file just to make sure that's the exact problem. It's fine if you don't want to share the time series but please at least provide the shape and size in megabytes.

Pandas handles large files very nicely. It actually performs lazy evaluations which means that it doesn't read them to memory but keeps references and reads data only when you need it. In case of EMD, whole signal needs to be read into memory. Since all splines are global you need extra (at least) 2x memory for envelopes, some memory for extrema and more memory for computing splines (in case of cubic that's O(N^2)). The crash is during boolean comparison of two diffs which requires ~4x memory of the original signal (on top of memory allocated in the previous heap). This isn't the most optimal code but, to be fair, EMD isn't good for a "large" number of extrema and PyEMD slowdown typically meant that there's no point analysing further.

EEMD and CEEMDAN will only be worse. They perform a single EMD hundreds of times in parallel on a slightly modified (noise augmented) signal.

Potential solutions

  1. Make sure you're analysing one signal at a time.
  2. Break signal into smaller signals. (I'm yet to see 'windowed' version of EMD.)
  3. Reduce floating point accuracy by using emd.DTYPE = np.float32 (or 16) (code)
  4. Don't use cubic spline but, e.g., Akima instead.

Try something like this:

import pandas as pd
from PyEMD import EMD

emd = EMD(spline_kind='akima')
emd.DTYPE = np.float32

df = pd.read_csv('data.csv')
ts = df[column_name].values  # slice if too large

imfs = emd(ts)

Remember

The source code is out and anyone is invited to propose improvements. Feel free to make the code more optimal :-)

wunderbarr commented 5 years ago

Thank you for your suggestion! After applying these modifications, the algorithm runs more quickly. And I think the output signal is sufficient for our analysis. By the way, for the steady waveform without any change, it would be one IMF. And in the Visualizer, it would throw an error 'AxesSubplot' object is not iterable. Then I fix these bugs in plot_imfs() as the following:

        if num_rows == 1:
            # axes = list(axes)
            axes.set_title("Time series")
            for num, imf in enumerate(imfs):
                # ax = axes[num]
                axes.plot(t, imf)
                axes.set_ylabel("IMF " + str(num+1))

        else:
            axes[0].set_title("Time series")

            for num, imf in enumerate(imfs):
                ax = axes[num]
                ax.plot(t, imf)
                ax.set_ylabel("IMF " + str(num+1))

            if include_residue:
                ax = axes[-1]
                ax.plot(t, residue)
                ax.set_ylabel("Res")

Same for plot_instant_freq() function:

        if num_rows == 1:
            axes.set_title("Instantaneous frequency")

            mean_inst_period = []

            for num, imf_inst_freq in enumerate(imfs_inst_freqs): 
                mean_freqs = np.mean(np.abs(imf_inst_freq))
                # print (mean_freqs, 1/mean_freqs)
                mean_inst_period.append(1/mean_freqs)
                ax = axes
                ax.plot(t[:-1], imf_inst_freq)
                # print (np.full((1, len(imf_inst_freq)), mean_freqs))
                ax.plot(t[:-1], np.full(len(imf_inst_freq), mean_freqs))
                ax.set_ylabel("IMF {} [Hz]".format(num+1))
        else:
            axes[0].set_title("Instantaneous frequency")

            mean_inst_period = []

            for num, imf_inst_freq in enumerate(imfs_inst_freqs): 
                mean_freqs = np.mean(np.abs(imf_inst_freq))
                # print (mean_freqs, 1/mean_freqs)
                mean_inst_period.append(1/mean_freqs)
                ax = axes[num]
                ax.plot(t[:-1], imf_inst_freq)
                # print (np.full((1, len(imf_inst_freq)), mean_freqs))
                ax.plot(t[:-1], np.full(len(imf_inst_freq), mean_freqs))
                ax.set_ylabel("IMF {} [Hz]".format(num+1))

And once again, thank you for your suggestions!

laszukdawid commented 5 years ago

Great that it worked and helped! I'm resolving this ticket :)

As for the fix, I don't think this is a bug. These plots are not for a single result (although maybe they should be?). Even in such case, I think it'd be better to use axes = axes if type(axes) == list else list(axes) and leave the rest. Great work though :) Good luck on your research!