google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.18k stars 718 forks source link

Merging vcf files error with glnexus:v1.2.7 #815

Closed poddarharsh15 closed 4 months ago

poddarharsh15 commented 4 months ago

**Have you checked the FAQ?

Describe the issue: Merging vcf files error. Setup

Steps to reproduce:

Num BCF records read 118736378 query hits 14552613 [E::bgzf_read_block] Invalid BGZF header at offset 265038798 [E::bgzf_read] Read block operation failed with error 2 after 0 of 32 bytes [E::bgzf_read] Read block operation failed with error 3 after 0 of 32 bytes Error: BCF read err

Screenshot from 2024-05-06 15-00-29

akolesnikov commented 4 months ago

Looks like this is the GLnexus question. Could you please post the question at GLNexus page Also, from the log output it looks like GLnexus was completed successfully.

pichuan commented 4 months ago

Hi @poddarharsh15

Actually, can you go back in your log and confirm that DeepTrio runs actually finish correctly?

If I remember correctly, our run_deeptrio one-step script might continue to run the following steps even when previous steps failed.

pichuan commented 4 months ago

And, follow up on @akolesnikov 's point, if you have gotten to this point, it would seem like these files should be complete?

/output/HG002.g.vcf.gz \
/output/HG003.g.vcf.gz \
/output/HG004.g.vcf.gz \

If you can examine those files and confirm, that will be great. (Or look at the log like I mentioned before. But given you have the files, checking the files directly might be easier :))

poddarharsh15 commented 4 months ago

Hi @pichuan, @akolesnikov,

I'm new to DeepTrio and couldn't locate the log files, but I have intermediate results showing that DeepTrio ran successfully without errors. Additionally, I successfully benchmarked the .vcf files generated by DeepTrio. I've attached screenshots for reference. Your assistance is greatly appreciated. Thank you

finished log Screenshot from 2024-05-07 09-52-02 Screenshot from 2024-05-07 09-52-32

Benchmark Screenshot from 2024-05-07 09-52-59

pichuan commented 4 months ago

Hi @poddarharsh15 , it seems like you're certain that the DeepTrio run finished correctly. In that case, I agree with @akolesnikov 's original assessment that this can be an issue for the downstream glnexus step, which we can't directly support.

One suggestion to try: If you need to check your run a bit more closely, maybe breaking it down to just running this part first:

udocker run \
-v "${PWD}/output":"/output" \
quay.io/mlin/glnexus:v1.2.7 \
/usr/local/bin/glnexus_cli \
--config DeepVariant_unfiltered \
/output/HG002.g.vcf.gz \
/output/HG003.g.vcf.gz \
/output/HG004.g.vcf.gz

before piping to the next step. Maybe that could help you identify what the errors are coming out from that step?

akolesnikov commented 4 months ago

Closing the issue. Feel free to reopen as needed.