Closed daeh closed 1 year ago
@daeh: In running your initial code, I receive the same error as you do, so that's verified. You also make the comment about running this code (I believe):
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>>
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> dataset_serial = map(model_handler, range(20))
>>> datasets = Pool(20).map(model_handler, range(20))
>>> list(dataset_serial)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>>
As you can see, I don't receive an error in the above. Is that what you are reporting?
As to why there's an error in when the serial version runs first, compared to when the parallel version runs first... I expect that is due to lazy evaluation of the model_handler
function. When a function is first executed, it produces new python objects, and potentially changes (1) what's pointed to by globals
, as well as (2) what needs to be passed to the other processors to be run in parallel. So, I expect it's some issue with regard to that. What exactly is going on, I can't say yet, but I assume it's a serialization issue.
The traceback that you are seeing with raise self._value
is indicative of a failure to exchange a serialized object between the two processes. This really isn't a pathos
issue, but more likely either a multiprocess
issue, or a dill
issue, or a matplotlib
issue... depending on what exactly is the problem.
I'm not seeing an obvious issue with dill
serializing your function, straight off...
>>> import dill
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> dill.copy(model_handler)
<function model_handler at 0x105cbb158>
>>>
>>> model_handler(1)
1
>>> dill.copy(model_handler)
<function model_handler at 0x105cbb158>
>>>
>>> datasets_serial = []
>>> for j in range(20):
... datasets_serial.append(model_handler(j))
...
>>>
>>> dill.copy(model_handler)
<function model_handler at 0x1082adc80>
So, maybe it's not a serialization issue. If I run directly through multiprocess
, I get a slightly different error.
>>> from multiprocess import Pool
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> datasets_serial = []
>>> for j in range(20):
... datasets_serial.append(model_handler(j))
...
>>> datasets = Pool(20).map(model_handler, range(20))
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "<stdin>", line 8, in model_handler
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/figure.py", line 2035, in savefig
self.canvas.print_figure(fname, **kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 2263, in print_figure
**kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 2589, in print_pdf
file.finalize()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 576, in finalize
self.writeFonts()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 721, in writeFonts
fonts[Fx] = self.embedTTF(realpath, chars[1])
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 1195, in embedTTF
return embedTTFType3(font, characters, descriptor)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 950, in embedTTFType3
for charcode in range(firstchar, lastchar+1)]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 950, in <listcomp>
for charcode in range(firstchar, lastchar+1)]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 946, in get_char_width
s, flags=LOAD_NO_SCALE | LOAD_NO_HINTING).horiAdvance
RuntimeError: In load_char: Could not load charcode
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 644, in get
raise self._value
RuntimeError: In load_char: Could not load charcode
>>>
I also noticed it throws a warning, which may be relevant:
/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/pyplot.py:537: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
max_open_warning, RuntimeWarning)
So, investigating the number of plots I can have open at once...
>>> from multiprocess import Pool
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> datasets_serial = []
>>> for j in range(2):
... datasets_serial.append(model_handler(j))
...
>>> datasets = Pool().map(model_handler, range(2))
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "<stdin>", line 8, in model_handler
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/figure.py", line 2035, in savefig
self.canvas.print_figure(fname, **kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 2263, in print_figure
**kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 2589, in print_pdf
file.finalize()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 576, in finalize
self.writeFonts()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 721, in writeFonts
fonts[Fx] = self.embedTTF(realpath, chars[1])
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 1195, in embedTTF
return embedTTFType3(font, characters, descriptor)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 950, in embedTTFType3
for charcode in range(firstchar, lastchar+1)]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 950, in <listcomp>
for charcode in range(firstchar, lastchar+1)]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 946, in get_char_width
s, flags=LOAD_NO_SCALE | LOAD_NO_HINTING).horiAdvance
RuntimeError: In load_char: Could not load charcode
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 644, in get
raise self._value
RuntimeError: In load_char: Could not load charcode
>>>
and
>>> from multiprocess import Pool
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> datasets_serial = []
>>> for j in range(1):
... datasets_serial.append(model_handler(j))
...
>>> datasets = Pool().map(model_handler, range(1))
>>>
and
>>> from multiprocess import Pool
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> datasets_serial = []
>>> for j in range(1):
... datasets_serial.append(model_handler(j))
...
>>> datasets = Pool(20).map(model_handler, range(2))
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "<stdin>", line 8, in model_handler
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/figure.py", line 2035, in savefig
self.canvas.print_figure(fname, **kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 2263, in print_figure
**kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 2589, in print_pdf
file.finalize()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 576, in finalize
self.writeFonts()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 721, in writeFonts
fonts[Fx] = self.embedTTF(realpath, chars[1])
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 1195, in embedTTF
return embedTTFType3(font, characters, descriptor)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 950, in embedTTFType3
for charcode in range(firstchar, lastchar+1)]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 950, in <listcomp>
for charcode in range(firstchar, lastchar+1)]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 946, in get_char_width
s, flags=LOAD_NO_SCALE | LOAD_NO_HINTING).horiAdvance
RuntimeError: In load_char: Could not load charcode
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/mmckerns/lib/python3.6/site-packages/multiprocess-0.70.7.dev0-py3.6-macosx-10.12-x86_64.egg/multiprocess/pool.py", line 644, in get
raise self._value
RuntimeError: In load_char: Could not load charcode
>>>
So, if I try to use threads instead of processes, I don't see a failure, even though I do see a warning...
>>> from multiprocess.dummy import Pool
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> datasets_serial = []
>>> for j in range(20):
... datasets_serial.append(model_handler(j))
...
>>> datasets = Pool(20).map(model_handler, range(20))
/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/pyplot.py:537: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
max_open_warning, RuntimeWarning)
>>>
So... I'm not sure what's going on yet. You might want to report this to matplotlib
and point to this issue. Either way, it needs more investigation.
Try closing the figure right after saving it.
plt.close()
I had the same error. I use joblib.Parallel
, and the problem was that I was calling a function (let's say) plot_figure
to only create the figure instance, and then saving and closing it outside the function.
When running it sequentially there was no problem, only when I run it multiple times in parallel.
So, I changed the function to also save and close the figure inside it.
Then the error was gone.
Just wanted to let you guys know that I experience the same issue independently of pathos
. In fact, I tested the multiprocessing
, joblib
and concurrent
packages, and had problems with each of them.
I opened an issue in matplotlib
(I borrowed @daeh's minimal example), here: https://github.com/matplotlib/matplotlib/issues/13723
Hopefully they will be able to help.
By the way: When I use plt.close()
the error still happens, but the warning does not appear anymore.
This appears to no longer be an issue, however it does throw a relevant warning...
Python 3.7.15 (default, Oct 12 2022, 04:11:53)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>>
>>> def model_handler(n):
... import matplotlib
... matplotlib.use('agg')
... import matplotlib.pyplot as plt
... figout = plt.figure(figsize=(1, 1))
... axes = figout.add_subplot(1,1,1)
... axes.scatter(range(1000), range(1000))
... figout.savefig('dummyfig{}.pdf'.format(n), format='pdf')
... return n
...
>>> datasets_serial = []
>>> for j in range(20):
... datasets_serial.append(model_handler(j))
...
>>> datasets = Pool(20).map(model_handler, range(20))
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
__main__:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
>>> datasets
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>>
I'm closing this.
I'm running into an issue trying to save matplotlib figures as PDF in parallel. The behavior is a bit odd and I'm not sure if this is a pathos issue, per se, but pathos is the only place I've encountered it. The minimal code below causes a few related errors in set_text and load_char
Will throw
Interestingly, if I replace the dataset_serial loop with
dataset_serial = map(model_handler, range(20))
, the pathos parallel pool executes with no problem, but if I convert the serial map object to a list (print(list(dataset_serial))
), the glyph exception in the parallel pool comes back. Any thoughts as to what's going on here?