google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.12k stars 702 forks source link

How to get list of variants after make_examples step? #808

Closed sophienguyen01 closed 2 months ago

sophienguyen01 commented 2 months ago

Hi,

Is it possible to get a list of variants in files ***.tfrecord-?????-of-000??.gz ( the output of make_examples step)?

One thing I can think of is to call show_examples command which will generate images for each variant from the file, but is there other faster way to get the list of variants?

Thank you

pichuan commented 2 months ago

Hi @sophienguyen01 You can try https://github.com/google/deepvariant/blob/r1.6.1/deepvariant/labeler/labeled_examples_to_vcf.py This is more experimental and not officially documented yet. But you can find in our Docker: https://github.com/google/deepvariant/blob/r1.6.1/Dockerfile#L152

For now, please read the code and use the flags there. As mentioned, this is experimental and we have not officially supported it yet, so I can't say it'll work for the specific use case that you want. But you can at least look at the code and see whether you can adapt it.

Another more typical approach is to run through the rest of the steps (call_variants, postprocess_variants) and get the list of variants that way.