Closed marchoeppner closed 2 years ago
Hi @marchoeppner , sorry for missing this issue from a while ago. I'll try to reproduce it on Centos7 with Singularity.
Hi, thanks for looking into it. Any news? Not sure how easy this will be reproduce and/or how Deepvariant determines whether python or c++ are used. From the code, I can only see protobuff being installed via pip - but that is probably not the whole story.
In the meantime I had access to another Centos7 cluster and the docker image works fine with the included Singularity (1.2 & 1.3). So it seems this may have something to do with the host system and pre-installed packages maybe? But I have honestly no way of even guessing what the problem could be then specifically.
/M
Hi, thank you for checking in again. Work has been really busy so I actually haven't looked at it yet. I will try to find 30min-1hr today to take a look.
But just checking -- you're saying it worked for you on another CentOS7?
Is it possible for you to check (on both machines) what's the different between the protobuff version installed? (I'm also not sure how Singularity version access which version of protobuf, but it'll be useful to check whether that is different on the two machines)
Protobuf isn't installed on either host machine
yum list installed | grep proto -> only returns "xorg-x11-proto-devel"
(I guess it should be included in the Docker container, no?)
(Sorry, still didn't get to this today. Will try tomorrow)
Heureka, I suppose - so this is the story:
I have previously installed a conda environment on the host system (has nothing to do with singularity or Deepvariant); and as part of that environment, I also installed ortools via pip (pip install ortools). One of the depencies is protobuf; which will not install the c++ version by default in this constellation (see: https://github.com/protocolbuffers/protobuf/issues/539).
The issue: The pip command installs the protobuf module in $HOME/.local/lib/python3.8/site-packages and for some reason, Deepvariant INSIDE the container sees this and tries to use it (my guess).
Any idea of how to prevent this from happening? I think binding $HOME into the container is pretty standard Singularity (and Docker?) behavior, but these kinds of library clashes shouldn't happen - maybe the container could be set up to not see these host-system libraries when loading modules in python?
Cheers, Marc
I see. Yeah, DeepVariant (specifically Nucleus) has a very specific way it uses protobuf. This is something we hope to improve for the long run. But for now, I don't have a very good general solution here. If you have any workaround for this for now, please just go ahead with it.
OK, to give this a try myself, here is what I did:
Get a CentOS7 machine to try:
gcloud compute instances create "${USER}-centos7" \
--scopes "compute-rw,storage-full,cloud-platform" \
--image-family "centos-7" \
--image-project "centos-cloud" \
--machine-type "e2-standard-16" \
--zone "us-west1-b"
ssh into the machine
gcloud compute ssh ${USER}-centos7 --zone us-west1-b
On the machine, install singularity: I used the instructions on https://github.com/sylabs/singularity/blob/master/INSTALL.md
After installation, I checked the version:
$ singularity --version
singularity-ce version 3.9.2
Then, I followed the steps on https://github.com/google/deepvariant/blob/r1.3/docs/deepvariant-quick-start.md to get data.
And then:
# Pull the image.
BIN_VERSION=1.3.0
singularity pull docker://google/deepvariant:"${BIN_VERSION}"
# Run DeepVariant.
singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
docker://google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref="${INPUT_DIR}"/ucsc.hg19.chr20.unittest.fasta \
--reads="${INPUT_DIR}"/NA12878_S1.chr20.10_10p1mb.bam \
--regions "chr20:10,000,000-10,010,000" \
--output_vcf="${OUTPUT_DIR}"/output.vcf.gz \
--output_gvcf="${OUTPUT_DIR}"/output.g.vcf.gz \
--intermediate_results_dir "${OUTPUT_DIR}/intermediate_results_dir" \
--num_shards=$(nproc)
This seems to work for me.
@marchoeppner Sorry that I don't think we have the bandwidth to support your specific case now. But if do find out how to make it work, please share what you find. Thanks!
Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.3/docs/FAQ.md:
Describe the issue: Deepvariant dies with protobuf error message when using Docker containers for version 1.2.0 and above. Works with 1.1.0 container.
Setup
Steps to reproduce:
Command output: sys.exit(main(argv)) File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_egfjk32i/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 160, in main proto_utils.uses_fast_cpp_protos_or_die() File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_egfjk32i/runfiles/com_google_deepvariant/third_party/nucleus/util/proto_utils.py", line 41, in uses_fast_cpp_protos_or_die raise ValueError('Expected to be using C++ protobuf implementation ' ValueError: Expected to be using C++ protobuf implementation (api_implementation.Type() == "cpp") but it is python Traceback (most recent call last): File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_24d7l2zv/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 180, in
app.run(main)
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_24d7l2zv/runfiles/absl_py/absl/app.py", line 299, in run
_run_main(main, args)
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_24d7l2zv/runfiles/absl_py/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_24d7l2zv/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 160, in main
proto_utils.uses_fast_cpp_protos_or_die()
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_24d7l2zv/runfiles/com_google_deepvariant/third_party/nucleus/util/proto_utils.py", line 41, in uses_fast_cpp_protos_or_die
raise ValueError('Expected to be using C++ protobuf implementation '
ValueError: Expected to be using C++ protobuf implementation (api_implementation.Type() == "cpp") but it is python
Traceback (most recent call last):
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_vykxim/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 180, in
app.run(main)
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_vykxim/runfiles/absl_py/absl/app.py", line 299, in run
_run_main(main, args)
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_vykxim/runfiles/absl_py/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_vykxim/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 160, in main
proto_utils.uses_fast_cpp_protos_or_die()
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_vykxim/runfiles/com_google_deepvariant/third_party/nucleus/util/proto_utils.py", line 41, in uses_fast_cpp_protos_or_die
raise ValueError('Expected to be using C++ protobuf implementation '
ValueError: Expected to be using C++ protobuf implementation (api_implementation.Type() == "cpp") but it is python
Traceback (most recent call last):
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_ipgylakz/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 180, in
app.run(main)
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_ipgylakz/runfiles/absl_py/absl/app.py", line 299, in run
_run_main(main, args)
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_ipgylakz/runfiles/absl_py/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_ipgylakz/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 160, in main
proto_utils.uses_fast_cpp_protos_or_die()
File "/scratch/SlurmTMP/sukmb352.4618444/Bazel.runfiles_ipgylakz/runfiles/com_google_deepvariant/third_party/nucleus/util/proto_utils.py", line 41, in uses_fast_cpp_protos_or_die
raise ValueError('Expected to be using C++ protobuf implementation '
ValueError: Expected to be using C++ protobuf implementation (api_implementation.Type() == "cpp") but it is python
parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref Homo_sapiens_GRCh38_no_alts.fa.gz --reads Indiv_I33975_Sample_I33975-L2.dedup.bam --examples /scratch/SlurmTMP/sukmb352.4618444/tmpdh6mqoql/make_examples.tfrecord@16.gz --gvcf /scratch/SlurmTMP/sukmb352.4618444/tmpdh6mqoql/gvcf.tfrecord@16.gz --regions xgen-exome-research-panel-targets-v2.bed --task 14
parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref Homo_sapiens_GRCh38_no_alts.fa.gz --reads Indiv_I33975_Sample_I33975-L2.dedup.bam --examples /scratch/SlurmTMP/sukmb352.4618444/tmpdh6mqoql/make_examples.tfrecord@16.gz --gvcf /scratch/SlurmTMP/sukmb352.4618444/tmpdh6mqoql/gvcf.tfrecord@16.gz --regions xgen-exome-research-panel-targets-v2.bed --task 7
Does the quick start test work on your system? Please test with https://github.com/google/deepvariant/blob/r0.10/docs/deepvariant-quick-start.md. Is there any way to reproduce the issue by using the quick start?
Any additional context: