nanoporetech / pod5-file-format

Pod5: a high performance file format for nanopore reads.
https://pod5-file-format.readthedocs.io/
Other
134 stars 18 forks source link

Semaphore hissy fit at the end of subset run #111

Open mp15 opened 8 months ago

mp15 commented 8 months ago

Issue Description

Process completes successfully but multithreading throws a hissy fit about not being able to unlink its semaphores.

Logs

$ pod5 subset -t 50 PAO27011_pass_7b4991d0_ec3250cb.pod5 --missing-ok --summary sequencing_summary_PAO27011_7b4991d0_ec3250cb.txt --columns channel --output /tmp/tmp.Al3fg28Kg1 Subsetting: 99%|#########9| 2603/2623 [32:00<00:14, 1.36Files/s] Traceback (most recent call last): File "/software/python-3.10.1/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/software/python-3.10.1/lib/python3.10/multiprocessing/util.py", line 224, in call res = self._callback(*self._args, **self._kwargs) File "/software/python-3.10.1/lib/python3.10/multiprocessing/synchronize.py", line 87, in _cleanup sem_unlink(name) FileNotFoundError: [Errno 2] No such file or directory

Specifications

HalfPhoton commented 8 months ago

Hi @mp15, Would you be able to add POD5_DEBUG=1 next time to hopefully capture in more detail what's going wrong here?

The number of "threads" (which are actually processes) -t 50 is quite high and the number of outputs (splitting by channel) is also quite high. This could be causing issues. Please consider lowering this value - the subsetting process is predominantly IO bound and there's diminishing returns with increasing threads.