levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Unable to use map() method in PepXML class #110

Closed freejstone closed 1 year ago

freejstone commented 1 year ago

Hi there!

I am using a package which has pyteomics as a dependency. Following some chains of errors, I noticed that the problem was using the map method in the PepXML class. This is the following issue I get using map:

pepxml_instance = PepXML(target_file)

pepxml_instance.map() Traceback (most recent call last):

Cell In[6], line 1 pepxml_instance.map()

File ~/opt/anaconda3/lib/python3.9/site-packages/pyteomics/auxiliary/file_helpers.py:1105 in map in_queue = mp.Queue(self._queue_size)

File ~/opt/anaconda3/lib/python3.9/multiprocessing/context.py:103 in Queue return Queue(maxsize, ctx=self.get_context())

File ~/opt/anaconda3/lib/python3.9/multiprocessing/queues.py:49 in init self._sem = ctx.BoundedSemaphore(maxsize)

File ~/opt/anaconda3/lib/python3.9/multiprocessing/context.py:88 in BoundedSemaphore return BoundedSemaphore(value, ctx=self.get_context())

File ~/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py:145 in init SemLock.init(self, SEMAPHORE, value, value, ctx=ctx)

File ~/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py:57 in init sl = self._semlock = _multiprocessing.SemLock(

OSError: [Errno 22] Invalid argument

Checking out the file_helpers.py file, I saw that the value assigned to _QUEUE_SIZE was int(1e7). Apparently reducing this down to int(1e4) resolves the problem. Are you able to reproduce this? I have given my system info below:

platform.machine() Out[8]: 'x86_64'

platform.version() Out[9]: 'Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000'

platform.platform() Out[10]: 'macOS-10.16-x86_64-i386-64bit'

platform.processor() Out[11]: 'i386'

Thanks a lot,

Jack

levitsky commented 1 year ago

Hi Jack,

this issue has been reported a while back on MacOS. This was at the time when Pyteomics was hosted on Bitbucket, but here is a copy of that issue: https://levitsky.github.io/bitbucket_backup/#!/levitsky/pyteomics/issues/44/page/1

Two solutions are listed there: a runtime argument to reader constructor and setting a file-level variable. My understanding is that you don't control the code that instantiates the reader class, so the first solution would not work for you. You can try the second option, though. Can you try the following?

from pyteomics.auxiliary import file_helpers as fh
fh._QUEUE_SIZE = 32767

This code should be run before the readers are created, preferably at the very beginning of your code.

Best regards, Lev

freejstone commented 1 year ago

Wonderful Lev, really appreciate it. In the end I put in a PR to have the fix occur when the reader class is instantiated.

Cheers,

Jack