levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

ThreadPool in PROXIAggregator not closed #66

Closed bittremieux closed 2 years ago

bittremieux commented 2 years ago

The PROXIAggregator uses a ThreadPool to query multiple PROXI resources simultaneously. This ThreadPool is stored between multiple function calls, I assume to avoid the overhead of creating the ThreadPool. However, it is never properly closed, resulting in the following error:

Exception ignored in: <function Pool.__del__ at 0x7fe660d174c0>
Traceback (most recent call last):
File "/home/wout/.conda/envs/spectrum_utils/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
File "/home/wout/.conda/envs/spectrum_utils/lib/python3.9/multiprocessing/queues.py", line 372, in put
AttributeError: 'NoneType' object has no attribute 'dumps'

This does not interfere with my code, as it only happens at the end, presumably when all objects are cleaned up. However, it would be nice to avoid having this error at all.

Maybe the ThreadPool should just be created for resolving of individual USIs, and then immediately cleaned up? The overhead of repeatedly creating the pool should be pretty minimal, especially compared to REST API calls.

mobiusklein commented 2 years ago

I've added PR #67 to implement that behavior. The previous behavior is available if the ephemeral_pool attribute is set to False (now True by default).

It's hard to reproduce the message since it seems to be sensitive to the order in which some standard library modules are destroyed during interpreter shutdown.

bittremieux commented 2 years ago

Looks like a good fix. Thanks!