google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.19k stars 721 forks source link

ValueError: `call_variants_outputs` did not pass sanity check. #517

Closed NagaComBio closed 2 years ago

NagaComBio commented 2 years ago

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.3/docs/FAQ.md: Yes

Describe the issue: The error arises during the "postprocess_variants" step. The quick-test and a run on chr22 from the same sample ran through without any issue. I tried to use group_variants=false as suggested here. But a similar error/crash occurs at a different variant/location. A similar problem was reported here, but the final fix is not provided.

Setup

Steps to reproduce:

Does the quick start test work on your system? Please test with https://github.com/google/deepvariant/blob/r0.10/docs/deepvariant-quick-start.md. Is there any way to reproduce the issue by using the quick start? No, the quick start and also chr22 from the same sample ran through.

Any additional context:

MariaNattestad commented 2 years ago

Hi @NagaComBio

Sorry for the delay! I don't have a clear solution to this problem just from looking at the error message, but if you can share the data, e.g. with just a small slice of the bam, then I can try to reproduce the issue. If that's possible, you can email me at marianattestad@google.com.

For now I can tell you that --group_variants=false is only applicable when using vcf_candidate_importer, which is the most common way that this error occurs, since the input VCF for that can have multiple candidate variants in the same position, which isn't supposed to be possible when the candidates are generated by make_examples without vcf_candidate_importer.

Thanks, Maria

NagaComBio commented 2 years ago

Hi @MariaNattestad

Thanks for the offer, but it would be difficult to share the data without a DTA. So, I went back and reran the workflow (--num_shards=5) for a short region around the above coordinates and then again for the complete chr1, both the tests ran through without any errors. And some of the candidate variants called earlier are not present here.

1       160351251       .       T       <*>     0       .       END=160351253   GT:GQ:MIN_DP:PL 0/0:50:80:0,261,2609
1       160351254       .       GTTTT   G,<*>   9.1     PASS    .       GT:GQ:DP:AD:VAF:PL      0/1:9:79:18,33,0:0.417722,0:8,0,26,990,990,990

Not sure how it's resolved. But, I will close this issue for now and will reopen it if a similar error pops up during the rerun of all the chrs.

Thank you for looking into the issue, Naga