Closed SpireGiorgioSavastano closed 5 years ago
Hey,
To be fair, I don't see anything that'd indicated that any of these modules is a culprit here. If anything, the traceback suggests that there are two many files open with pandas. Not sure whether pandas can/should close these files by default.
If you only see the problem with these two modules, and/or are on Windows, let me suggest putting everything in "if name == 'main' " (check readme as it was updated yesterday).
Dawid
Hi,
Thanks a lot for your reply.
Pandas should not be an issue since read_csv
close files by default.
I am running the code on mac using jupyther notebook. I added the main part and I got the same error but with a more meaningful description.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-4-93a89dae0ad8> in <module>
10
11 # Assign EEMD to `eemd` variable
---> 12 eemd = EEMD()
13
14 # Execute EEMD on S
~/Develop/venvp3/lib/python3.7/site-packages/EMD_signal-0.2.7-py3.7.egg/PyEMD/EEMD.py in __init__(self, trials, noise_width, ext_EMD, **config)
86 # By default (None) Pool spawns #processes = #CPU
87 processes = None if "processes" not in config else config["processes"]
---> 88 self.pool = Pool(processes=processes)
89
90 # Update based on options
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py in Pool(self, processes, initializer, initargs, maxtasksperchild)
117 from .pool import Pool
118 return Pool(processes, initializer, initargs, maxtasksperchild,
--> 119 context=self.get_context())
120
121 def RawValue(self, typecode_or_type, *args):
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py in __init__(self, processes, initializer, initargs, maxtasksperchild, context)
174 self._processes = processes
175 self._pool = []
--> 176 self._repopulate_pool()
177
178 self._worker_handler = threading.Thread(
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py in _repopulate_pool(self)
239 w.name = w.name.replace('Process', 'PoolWorker')
240 w.daemon = True
--> 241 w.start()
242 util.debug('added worker')
243
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py in start(self)
110 'daemonic processes are not allowed to have children'
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py in _Popen(process_obj)
275 def _Popen(process_obj):
276 from .popen_fork import Popen
--> 277 return Popen(process_obj)
278
279 class SpawnProcess(process.BaseProcess):
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py in __init__(self, process_obj)
18 self.returncode = None
19 self.finalizer = None
---> 20 self._launch(process_obj)
21
22 def duplicate_for_child(self, fd):
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py in _launch(self, process_obj)
67 def _launch(self, process_obj):
68 code = 1
---> 69 parent_r, child_w = os.pipe()
70 self.pid = os.fork()
71 if self.pid == 0:
OSError: [Errno 24] Too many open files
Could be a problem connected to the processes? EMD doesn’t give me any problem.
Thanks
I see. Seems that Jupiter is trying to parallelize code in the same way as I am and were getting each other's processes. I'm currently away from laptop so can't check exactly the code, but try passing something like EEMD(processes=1)
.
Hi,
I tried ceemdan = CEEMDAN(processes=1)
. The script loops for more files than before, but at some point it will still crash:
Traceback (most recent call last):
File "CEEMDAN.py", line 22, in <module>
File "/Users/sysadmin/Develop/venvp3/lib/python3.7/site-packages/EMD_signal-0.2.7-py3.7.egg/PyEMD/CEEMDAN.py", line 115, in __init__
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 119, in Pool
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 112, in start
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/popen_fork.py", line 69, in _launch
OSError: [Errno 24] Too many open files
This time I am running a script from terminal.
Ok, then this needs solving. Since the EEMD and CEEMDAN is an ensamble based algorithm it's also embarasingly parallalizable. I tried to utilize this by using as many processes as possible to run everything in parallel. However, it seems that the implementation is requesting always at least one, even when no is available.
The solution would be to detect whether there are any processes available and if no is available then run the module in threads. I'll try to fix this in a couple of days.
Short term solution if it's possible please run for
in smaller batches. If that's not possible I can quickly temporarliy disable parallalization.
Update:
I've updated the code to allow disabling spawning extra processes. Please pass parallel=False
to EEMD/CEEMAND.
eemd = EEMD(parallel=False)
ceemdan = CEEMDAN(parallel=False)
Thinking about the issue I'm still not convinced that the PyEMD is the main source of the trouble. Traceback suggests that you have reached the limit of open files (in Unix that's set with ulimit
). To avoid it you can either increase the limit or make sure that they are closed after reading. I think the exception is from PyEMD/Pool/Popen because in Unix new processes have "copy (memory) on write", so that's where a new file opened is being registered. The change should resolve this as if there are no new processes spawned then everything is being maintained under a single with Python's GIL in control.
If this doesn't work then it's unlikely PyEMDs fault. A solution might be something like:
t_all = []
s_all = []
for fln in lista:
df = pd.read_csv(fln, header = 0)
# Define signal
t_all.append(np.asarray(df.A))
s_all.append(np.asarray(df.B))
# fln is a file: fln.close()
for s, t in zip(t_all, s_all):
# Execute EMD on signal
ceemdan = CEEMDAN()
cIMFs = ceemdan(s, t)
Please let me know if this helps.
Thanks a lot for your work. I am going to test the new code by the end of this week.
Does pyEMD also allow to plot the Hilbert Spectrum?
Can you specify what you mean by Hilbert Spectrum? Seems that many people have different expectations. If you are referring to visualisation of the Huang-Hilbert Transformation (Hilbert transformation on IMFs), i.e. something like here https://github.com/laszukdawid/PyEMD/blob/master/example/hht_example.png then you can use the following code https://github.com/laszukdawid/PyEMD/blob/master/example/hht_example.py .
There's also Visualisation module as a part of the PyEMD https://pyemd.readthedocs.io/en/latest/visualisation.html .
Otherwise, please let me know what you expect.
I'm closing this thread as it's been quite a while since the last update and the latest request isn't related to the subject.
I encountered a similar problem like SpireGiorgioSavastano for EEMD() in PyEMD. It seems that this problem happens because PyEMD doesn't release the resource occupied by threads in time after calling a transformation method like eemd(), ceemdan(). For example, I add "self.pool.close()" at the end of the implementation of the method eemd() under the class EEMD and it solves the problem encountered by SpireGiorgioSavastano and me. Hope it helps.
Thank you @LllC-mmd for checking this! I'll make the change as suggested.
It's been a while but it's done. I've updated the code to open and close pool connections on execution rather than object creation. Thanks!
Hello,
It seems that python return an error when calling EEMD() or CEEMDAN() inside a for loop on several files:
e. g.
Do you know what may be the problem here?
Thanks in advance