google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.23k stars 725 forks source link

Exit status 250 Could not read base quality scores #396

Closed husamia closed 3 years ago

husamia commented 3 years ago

I am running on BAM generated with BWA. The fastq that was used had one reads file with less sequences.

Docker latest image.

/opt/deepvariant/bin/run_deepvariant --model_type WGS --num_shards 5 --output_vcf 19CT030668_deepvariant.vcf --reads 19CT030668.bam --ref ../human_g1k_v37.fasta --sample_name 19CT030668

The processe starts but when it gets to end of the BAM it exits.

2020-12-11 15:45:32.928477: W third_party/nucleus/io/sam_reader.cc:532] Could not read base quality scores A00215:130:HK3H2DSXX:2:2549:4869:24768: Not found: Could not read base quality scores 2020-12-11 15:45:32.931412: F deepvariant/allelecounter.cc:103] Check failed: offset + len <= read.aligned_quality_size() (1 vs. 0) Fatal Python error: Aborted

Current thread 0x00007f38280a8700 (most recent call first): File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/realigner/window_selector.py", line 76 in _candidates_from_reads File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/realigner/window_selector.py", line 237 in select_windows File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/realigner/realigner.py", line 574 in realign_reads File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1129 in region_reads File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1055 in process File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1377 in make_examples_runner File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1500 in main File "/tmp/Bazel.runfilespfk22dn/runfiles/absl_py/absl/app.py", line 251 in _run_main File "/tmp/Bazel.runfilespfk22dn/runfiles/absl_py/absl/app.py", line 300 in run File "/tmp/Bazel.runfilespfk22dn/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 1510 in parallel: This job failed: /opt/deepvariant/bin/make_examples --mode calling --ref ../human_g1k_v37.fasta --reads 19CT030668.bam --examples /tmp/tmpy0c9vszu/make_examples.tfrecord@5.gz --sample_name 19CT030668 --task 3

real 102m9.946s user 101m6.868s sys 0m54.452s I1211 15:45:33.976910 139868428207872 run_deepvariant.py:321] None Traceback (most recent call last): File "/opt/deepvariant/bin/run_deepvariant.py", line 332, in app.run(main) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/opt/deepvariant/bin/run_deepvariant.py", line 319, in main subprocess.check_call(command, shell=True, executable='/bin/bash') File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'time seq 0 4 | parallel --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "../human_g1k_v37.fasta" --reads "19CT030668.bam" --examples "/tmp/tmpy0c9vszu/make_examples.tfrecord@5.gz" --sample_name "19CT030668" --task {}' returned non-zero exit status 250.

can ignore read quality scores check and give results?

MariaNattestad commented 3 years ago

This error might be caused by missing base quality scores in the BAM file, were you expecting this? DeepVariant does require valid base quality scores. You could technically use filler values, but DeepVariant was only trained with real base qualities, so the results will be much more reliable and accurate if you can get a BAM file with real base quality scores.

husamia commented 3 years ago

This error might be caused by missing base quality scores in the BAM file, were you expecting this? DeepVariant does require valid base quality scores. You could technically use filler values, but DeepVariant was only trained with real base qualities, so the results will be much more reliable and accurate if you can get a BAM file with real base quality scores.

I am expecting missing quality scores. I would like to ignore it. Would that be possible?

MariaNattestad commented 3 years ago

No that's not possible in DeepVariant, see details in my answer above.