google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.2k stars 722 forks source link

Unable to run call_variants in udocker #733

Closed f-ferraro closed 11 months ago

f-ferraro commented 11 months ago

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.6/docs/FAQ.md: yes

Describe the issue: (A clear and concise description of what the issue is.)

Hi, I am trying to set up DeepVariant on our server and would like to use udocker. It runs fine for the make_examples but It gets stuck with call_variants. I get the same error with both my data and the quick start. If I enable intermediate_results_dir, I can actually see the files being generated as expected. Could you please help me?

Setup

Steps to reproduce:

udocker run \
  -v ${INPUT_DIR}:"/input" \
  -v ${OUTPUT_DIR}:"/output" \
  DeepVariant \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type=WGS \
  --ref=/input/"ucsc.hg19.chr20.unittest.fasta" \
  --reads=/input/"NA12878_S1.chr20.10_10p1mb.bam" \
  --regions "chr20:10,000,000-10,010,000" \
  --output_vcf=/output/output.vcf.gz \
  --output_gvcf=/output/output.g.vcf.gz \
  --num_shards=16
***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile "/tmp/tmpz5qvn8j2/call_variants_output.tfrecord.gz" --examples "/tmp/tmpz5qvn8j2/make_examples.tfrecord@16.gz" --checkpoint "/opt/models/wgs"

/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_3accq8qt/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 633, in <module>
    app.run(main)
  File "/tmp/Bazel.runfiles_3accq8qt/runfiles/absl_py/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_3accq8qt/runfiles/absl_py/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_3accq8qt/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 618, in main
    call_variants(
  File "/tmp/Bazel.runfiles_3accq8qt/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 430, in call_variants
    output_queue = multiprocessing.Queue()
  File "/usr/lib/python3.8/multiprocessing/context.py", line 103, in Queue
    return Queue(maxsize, ctx=self.get_context())
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 42, in __init__
    self._rlock = ctx.Lock()
  File "/usr/lib/python3.8/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
FileNotFoundError: [Errno 2] No such file or directory

real    0m41.958s
user    0m6.224s
sys     0m3.683s

Does the quick start test work on your system? Please test with https://github.com/google/deepvariant/blob/r0.10/docs/deepvariant-quick-start.md. Is there any way to reproduce the issue by using the quick start?

Yes, the error happens with the quick start.

Any additional context:

Files generated with intermediate_results_dir

gvcf.tfrecord-00000-of-00016.gz  make_examples.tfrecord-00000-of-00016.gz                    make_examples.tfrecord-00008-of-00016.gz
gvcf.tfrecord-00001-of-00016.gz  make_examples.tfrecord-00000-of-00016.gz.example_info.json  make_examples.tfrecord-00008-of-00016.gz.example_info.json
gvcf.tfrecord-00002-of-00016.gz  make_examples.tfrecord-00001-of-00016.gz                    make_examples.tfrecord-00009-of-00016.gz
gvcf.tfrecord-00003-of-00016.gz  make_examples.tfrecord-00001-of-00016.gz.example_info.json  make_examples.tfrecord-00009-of-00016.gz.example_info.json
gvcf.tfrecord-00004-of-00016.gz  make_examples.tfrecord-00002-of-00016.gz                    make_examples.tfrecord-00010-of-00016.gz
gvcf.tfrecord-00005-of-00016.gz  make_examples.tfrecord-00002-of-00016.gz.example_info.json  make_examples.tfrecord-00010-of-00016.gz.example_info.json
gvcf.tfrecord-00006-of-00016.gz  make_examples.tfrecord-00003-of-00016.gz                    make_examples.tfrecord-00011-of-00016.gz
gvcf.tfrecord-00007-of-00016.gz  make_examples.tfrecord-00003-of-00016.gz.example_info.json  make_examples.tfrecord-00011-of-00016.gz.example_info.json
gvcf.tfrecord-00008-of-00016.gz  make_examples.tfrecord-00004-of-00016.gz                    make_examples.tfrecord-00012-of-00016.gz
gvcf.tfrecord-00009-of-00016.gz  make_examples.tfrecord-00004-of-00016.gz.example_info.json  make_examples.tfrecord-00012-of-00016.gz.example_info.json
gvcf.tfrecord-00010-of-00016.gz  make_examples.tfrecord-00005-of-00016.gz                    make_examples.tfrecord-00013-of-00016.gz
gvcf.tfrecord-00011-of-00016.gz  make_examples.tfrecord-00005-of-00016.gz.example_info.json  make_examples.tfrecord-00013-of-00016.gz.example_info.json
gvcf.tfrecord-00012-of-00016.gz  make_examples.tfrecord-00006-of-00016.gz                    make_examples.tfrecord-00014-of-00016.gz
gvcf.tfrecord-00013-of-00016.gz  make_examples.tfrecord-00006-of-00016.gz.example_info.json  make_examples.tfrecord-00014-of-00016.gz.example_info.json
gvcf.tfrecord-00014-of-00016.gz  make_examples.tfrecord-00007-of-00016.gz                    make_examples.tfrecord-00015-of-00016.gz
gvcf.tfrecord-00015-of-00016.gz  make_examples.tfrecord-00007-of-00016.gz.example_info.json  make_examples.tfrecord-00015-of-00016.gz.example_info.json
akolesnikov commented 11 months ago

The error comes from the line output_queue = multiprocessing.Queue() Could you try a simple test? Run docker in CLI model: docker run -it <DeepVariant image> bash Inside docker start Python3 and execute:

import multiprocessing
q = multiprocessing.Queue()

Please let us know if that works.

kishwarshafin commented 11 months ago

Adding to @akolesnikov's comment. I believe DeepVariant is still not supported in udocker. I am unsure if udocker is causing the multiprocessing issue. Can you also try with:

--postprocess_cpus 0 \

to see if that resolves the issue.

So your command would be:

udocker run \
  -v ${INPUT_DIR}:"/input" \
  -v ${OUTPUT_DIR}:"/output" \
  DeepVariant \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type=WGS \
  --ref=/input/"ucsc.hg19.chr20.unittest.fasta" \
  --reads=/input/"NA12878_S1.chr20.10_10p1mb.bam" \
  --regions "chr20:10,000,000-10,010,000" \
  --output_vcf=/output/output.vcf.gz \
  --output_gvcf=/output/output.g.vcf.gz \
  --postprocess_cpus 0 \
  --num_shards=16
f-ferraro commented 11 months ago

Hi, thank you both for the answers and suggestions.

The error comes from the line output_queue = multiprocessing.Queue() Could you try a simple test? Run docker in CLI model: docker run -it <DeepVariant image> bash Inside docker start Python3 and execute:

import multiprocessing
q = multiprocessing.Queue()

Please let us know if that works.

No, it doesn't work. I get the following error that parallels the one above (full disclosure: I run it again with udocker, not docker):

Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> q = multiprocessing.Queue()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/multiprocessing/context.py", line 103, in Queue
    return Queue(maxsize, ctx=self.get_context())
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 42, in __init__
    self._rlock = ctx.Lock()
  File "/usr/lib/python3.8/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
FileNotFoundError: [Errno 2] No such file or directory

Also the approach suggested by @kishwarshafin unfortunately didn't work for me. I thought that udocker could be a viable option considering what said in #669. Maybe I'll try to downgrade to 1.5.0 since it's the version that was mentioned in the orginal post.

I'm not really familiar with multiprocessing but I will have a look. If you have any additional pointers, I would be really grateful for them :)

Thank you! Federico

EDIT: I tried running DeepVariant v1.5.0 and indeed it works! So I guess it is an issue of the newer release.

akolesnikov commented 11 months ago

@f-ferraro,

multiprocessing is the standard Python module which DeepVariant depends on. Without a proper Python environment I'm afraid there is no way to make it work.

danielecook commented 11 months ago

@f-ferraro it sounds like you were able to get DeepVariant running.

If you are still having issues please reopen or start a new issue. You may also consider trying singularity if you have not already.