laszukdawid / PyEMD

Python implementation of Empirical Mode Decompoisition (EMD) method
https://pyemd.readthedocs.io/
Apache License 2.0
867 stars 224 forks source link

OSError: [Errno 24] Too many open files #54

Closed SpireGiorgioSavastano closed 5 years ago

SpireGiorgioSavastano commented 5 years ago

Hello,

It seems that python return an error when calling EEMD() or CEEMDAN() inside a for loop on several files:

for fln in lista:
    df = pd.read_csv(fln, header = 0)
    # Define signal
    t = np.asarray(df.A)
    s = np.asarray(df.B)

    # Execute EMD on signal
    ceemdan = CEEMDAN()
    cIMFs = ceemdan(s, t)

e. g.

Traceback (most recent call last):
  File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-18-38e7113ec9c7>", line 10, in <module>
    ceemdan = CEEMDAN()
  File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/EMD_signal-0.2.7-py3.7.egg/PyEMD/CEEMDAN.py", line 115, in __init__
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 119, in Pool
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 158, in __init__
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 252, in _setup_queues
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 112, in SimpleQueue
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/queues.py", line 331, in __init__
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 517, in Pipe
OSError: [Errno 24] Too many open files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2033, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'OSError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1095, in get_records
  File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/IPython/core/ultratb.py", line 313, in wrapped
  File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/IPython/core/ultratb.py", line 347, in _fixed_getinnerframes
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 1502, in getinnerframes
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 1460, in getframeinfo
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 696, in getsourcefile
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 725, in getmodule
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 709, in getabsfile
  File "/Users/sysadmin/Develop/venvp3/bin/../lib/python3.7/posixpath.py", line 383, in abspath
OSError: [Errno 24] Too many open files

Do you know what may be the problem here?

Thanks in advance

laszukdawid commented 5 years ago

Hey,

To be fair, I don't see anything that'd indicated that any of these modules is a culprit here. If anything, the traceback suggests that there are two many files open with pandas. Not sure whether pandas can/should close these files by default.

If you only see the problem with these two modules, and/or are on Windows, let me suggest putting everything in "if name == 'main' " (check readme as it was updated yesterday).

Dawid

SpireGiorgioSavastano commented 5 years ago

Hi,

Thanks a lot for your reply.

Pandas should not be an issue since read_csv close files by default.

I am running the code on mac using jupyther notebook. I added the main part and I got the same error but with a more meaningful description.

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-4-93a89dae0ad8> in <module>
     10 
     11         # Assign EEMD to `eemd` variable
---> 12         eemd = EEMD()
     13 
     14         # Execute EEMD on S

~/Develop/venvp3/lib/python3.7/site-packages/EMD_signal-0.2.7-py3.7.egg/PyEMD/EEMD.py in __init__(self, trials, noise_width, ext_EMD, **config)
     86         # By default (None) Pool spawns #processes = #CPU
     87         processes = None if "processes" not in config else config["processes"]
---> 88         self.pool = Pool(processes=processes)
     89 
     90         # Update based on options

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py in Pool(self, processes, initializer, initargs, maxtasksperchild)
    117         from .pool import Pool
    118         return Pool(processes, initializer, initargs, maxtasksperchild,
--> 119                     context=self.get_context())
    120 
    121     def RawValue(self, typecode_or_type, *args):

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py in __init__(self, processes, initializer, initargs, maxtasksperchild, context)
    174         self._processes = processes
    175         self._pool = []
--> 176         self._repopulate_pool()
    177 
    178         self._worker_handler = threading.Thread(

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py in _repopulate_pool(self)
    239             w.name = w.name.replace('Process', 'PoolWorker')
    240             w.daemon = True
--> 241             w.start()
    242             util.debug('added worker')
    243 

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py in start(self)
    110                'daemonic processes are not allowed to have children'
    111         _cleanup()
--> 112         self._popen = self._Popen(self)
    113         self._sentinel = self._popen.sentinel
    114         # Avoid a refcycle if the target function holds an indirect

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py in _Popen(process_obj)
    275         def _Popen(process_obj):
    276             from .popen_fork import Popen
--> 277             return Popen(process_obj)
    278 
    279     class SpawnProcess(process.BaseProcess):

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py in __init__(self, process_obj)
     18         self.returncode = None
     19         self.finalizer = None
---> 20         self._launch(process_obj)
     21 
     22     def duplicate_for_child(self, fd):

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py in _launch(self, process_obj)
     67     def _launch(self, process_obj):
     68         code = 1
---> 69         parent_r, child_w = os.pipe()
     70         self.pid = os.fork()
     71         if self.pid == 0:

OSError: [Errno 24] Too many open files

Could be a problem connected to the processes? EMD doesn’t give me any problem.

Thanks

laszukdawid commented 5 years ago

I see. Seems that Jupiter is trying to parallelize code in the same way as I am and were getting each other's processes. I'm currently away from laptop so can't check exactly the code, but try passing something like EEMD(processes=1).

SpireGiorgioSavastano commented 5 years ago

Hi,

I tried ceemdan = CEEMDAN(processes=1). The script loops for more files than before, but at some point it will still crash:

Traceback (most recent call last):
  File "CEEMDAN.py", line 22, in <module>
  File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/EMD_signal-0.2.7-py3.7.egg/PyEMD/CEEMDAN.py", line 115, in __init__
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 119, in Pool
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 112, in start
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py", line 69, in _launch
OSError: [Errno 24] Too many open files

This time I am running a script from terminal.

laszukdawid commented 5 years ago

Ok, then this needs solving. Since the EEMD and CEEMDAN is an ensamble based algorithm it's also embarasingly parallalizable. I tried to utilize this by using as many processes as possible to run everything in parallel. However, it seems that the implementation is requesting always at least one, even when no is available.

The solution would be to detect whether there are any processes available and if no is available then run the module in threads. I'll try to fix this in a couple of days.

Short term solution if it's possible please run for in smaller batches. If that's not possible I can quickly temporarliy disable parallalization.

laszukdawid commented 5 years ago

Update: I've updated the code to allow disabling spawning extra processes. Please pass parallel=False to EEMD/CEEMAND.

eemd = EEMD(parallel=False)
ceemdan = CEEMDAN(parallel=False)

Thinking about the issue I'm still not convinced that the PyEMD is the main source of the trouble. Traceback suggests that you have reached the limit of open files (in Unix that's set with ulimit). To avoid it you can either increase the limit or make sure that they are closed after reading. I think the exception is from PyEMD/Pool/Popen because in Unix new processes have "copy (memory) on write", so that's where a new file opened is being registered. The change should resolve this as if there are no new processes spawned then everything is being maintained under a single with Python's GIL in control.

If this doesn't work then it's unlikely PyEMDs fault. A solution might be something like:

t_all = []
s_all = []
for fln in lista:
    df = pd.read_csv(fln, header = 0)
    # Define signal
    t_all.append(np.asarray(df.A))
    s_all.append(np.asarray(df.B))
    # fln is a file: fln.close()

for s, t in zip(t_all, s_all):
    # Execute EMD on signal
    ceemdan = CEEMDAN()
    cIMFs = ceemdan(s, t)

Please let me know if this helps.

SpireGiorgioSavastano commented 5 years ago

Thanks a lot for your work. I am going to test the new code by the end of this week.

Does pyEMD also allow to plot the Hilbert Spectrum?

laszukdawid commented 5 years ago

Can you specify what you mean by Hilbert Spectrum? Seems that many people have different expectations. If you are referring to visualisation of the Huang-Hilbert Transformation (Hilbert transformation on IMFs), i.e. something like here https://github.com/laszukdawid/PyEMD/blob/master/example/hht_example.png then you can use the following code https://github.com/laszukdawid/PyEMD/blob/master/example/hht_example.py .

There's also Visualisation module as a part of the PyEMD https://pyemd.readthedocs.io/en/latest/visualisation.html .

Otherwise, please let me know what you expect.

laszukdawid commented 5 years ago

I'm closing this thread as it's been quite a while since the last update and the latest request isn't related to the subject.

LllC-mmd commented 4 years ago

I encountered a similar problem like SpireGiorgioSavastano for EEMD() in PyEMD. It seems that this problem happens because PyEMD doesn't release the resource occupied by threads in time after calling a transformation method like eemd(), ceemdan(). For example, I add "self.pool.close()" at the end of the implementation of the method eemd() under the class EEMD and it solves the problem encountered by SpireGiorgioSavastano and me. Hope it helps.

laszukdawid commented 4 years ago

Thank you @LllC-mmd for checking this! I'll make the change as suggested.

laszukdawid commented 4 years ago

It's been a while but it's done. I've updated the code to open and close pool connections on execution rather than object creation. Thanks!