Open tomwhite opened 6 months ago
Can you try that with just explode, and with different numbers of workers please?
vcf2zarr convert sample.vcf.gz sample.icf
vcf2zarr explode sample.vcf.gz sample.icf
Do you want to overwrite sample.icf? (use --force to skip this check) [y/N]: y
Scan: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 2.48files/s]
Explode: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00/9.00 [00:00<00:00, 23.1vars/s]
/Users/tom/miniconda3/envs/bio2zarr/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
vcf2zarr explode sample.vcf.gz sample.icf -p 3
Do you want to overwrite sample.icf? (use --force to skip this check) [y/N]: y
Scan: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 2.51files/s]
Explode: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00/9.00 [00:00<00:00, 20.8vars/s]
/Users/tom/miniconda3/envs/bio2zarr/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Hmm, seems to be something to do with tqdm:
I had a go at this in #214 - would you mind trying it out and seeing if it resolves this problem please @tomwhite?
Still seeing the warning with the new code...
Hmm, ok thanks. I might need a bit of help with this one then, it's too obscure to track down with being able to reproduce. Any chance you could take a look?🙏
I tried removing all the tqdm
code (including imports) and I still get the warning. I'll keep looking, but I'm not really sure what is going on here!
Great to know that tqdm isn't causing this! I guess it must be something to do with the locks associated with the multiprocessing.Value. Is there anyway we can get more detailed feedback on where these semaphores are being leaked?
Did you have any luck tracking this down @tomwhite? Some things that would be useful to try:
I'd really like to get rid of this...
Can we just capture the warning in main, as a workaround also?
Looks like this is some python 3.9 on mac quirk - I've reproduced on CI on both ARM and intel:
I made an attempt to catch the warnings in #226, but it's tricky to do this via CI. I'm sure the warnings are harmless, and as it's only on Python 3.9 I don't think we need to make it a release blocker. Would be nice to just suppress the warning, at the same time, though.
https://discuss.pytorch.org/t/issue-with-multiprocessing-semaphore-tracking/22943
Some comments here on how to suppress.
Just to update here that I'm working on tracking this down. It's a doozy...
I've tried lots of different ways to resolve this, and have come to the conclusion that it's something quite specific to Python 3.9 on Macs. Given that we don't leak semaphores on later Python versions it seems likely to me that this is an underlying bug in Python, and (given the substantial effort in trying to find workarounds) there's likely not much we can do about it.
So, the conclusion is to mark this as a known issue, and to document the problem, suggesting that users move Python version if they are going to be doing serious work on their macs.
I way away last week, so thanks for tracking this down and documenting everything!
I agree that it's fine to document the Python 3.9 limitation. Python 3.9 won't be around much longer anyway (https://scientific-python.org/specs/spec-0000/).
From https://github.com/sgkit-dev/bio2zarr/issues/201#issuecomment-2112045205
On a Mac: