kishwarshafin / pepper

PEPPER-Margin-DeepVariant
MIT License
243 stars 42 forks source link

What base type does ./. (in vcffile output)indicate? #124

Closed lxxiaoxiaLi closed 2 years ago

lxxiaoxiaLi commented 2 years ago

Hi Kishwar,

I have just tested PEPPER Nanopore variant calling workflow. My result= is as follows:

fileformat=VCFv4.2

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

contig=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample

chr20 60750 . T A 0 refCall . GT:GQ:DP:AD:VAF:C 0/0:26:13:5:0.384615:DV chr20 61115 . G A 1 refCall . GT:GQ:DP:AD:VAF:C ./.:7:14:8:0.571429:DV chr20 61220 . C A 0 refCall . GT:GQ:DP:AD:VAF:C 0/0:22:14:7:0.5:DV chr20 61376 . C CATTAAAT 0 refCall . GT:GQ:DP:AD:VAF:C 0/0:37:15:2:0.133333:DV chr20 61424 . C CACTCA 0 refCall . GT:GQ:DP:AD:VAF:C 0/0:35:14:3:0.214286:DV chr20 61568 . T C 1.8 refCall . GT:GQ:DP:AD:VAF:C ./.:5:16:9:0.5625:DV

I am wondering What base type does the ./. in vcffile resulte indicate? Do I need to filter it? Thanks.

kishwarshafin commented 2 years ago

hi @lxxiaoxiaLi ,

It's a refCall which ultimately means "Call is homozygous to the reference". In these cases, 0/0 and ./. have the same meaning.

Filtering is your choice. We generally keep all calls, filtering is something the user can choose to do.

lxxiaoxiaLi commented 2 years ago

Thank you, I'll note this as a to-do. I read your paper(Nature Methods: Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.)and find the method says:

Code availability The modules of PEPPER-Margin-DeepVariant are publicly available in these repositories: • PEPPER: https://github.com/kishwarshafin/pepper • Margin: https://github.com/UCSC-nanopore-cgl/margin • DeepVariant: https://github.com/google/deepvariant The PEPPER-Margin-DeepVariant software57 is available at https://doi.org/10.5281/zenodo.5275510, and we used r0.4 version for the evaluation presented in this manuscript. For simpler use, we have also created a publicly available docker container, kishwars/pepper_deepvariant:r0.4, that can run our variant-calling and polishing pipelines.

I am wondering If I only need to run this command to get final SNP? No longer need to run margin and deepvariant sottwares:

docker run \ -v "kishwarshafin-pepper/test":"kishwarshafin-pepper/test" \ kishwars/pepper_deepvariant:r0.7 \ run_pepper_margin_deepvariant call_variant \ -b "${INPUT_DIR}/${BAM}" \ -f "${INPUT_DIR}/${REF}" \ -o "${OUTPUT_DIR}" \ -p "${OUTPUT_PREFIX}" \ -t "${THREADS}" \ --ont_r9_guppy5_sup

In addition, there is another question about the accuracy and number of calls for heterozygous sites in PEPPER-Margin-DeepVariant pipeline,Thanks.

lxxiaoxiaLi commented 2 years ago

Thank you, I'll note this as a to-do. I read your paper(Nature Methods: Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.)and find the method says:

Code availability The modules of PEPPER-Margin-DeepVariant are publicly available in these repositories: • PEPPER: https://github.com/kishwarshafin/pepper • Margin: https://github.com/UCSC-nanopore-cgl/margin • DeepVariant: https://github.com/google/deepvariant The PEPPER-Margin-DeepVariant software57 is available at https://doi.org/10.5281/zenodo.5275510, and we used r0.4 version for the evaluation presented in this manuscript. For simpler use, we have also created a publicly available docker container, kishwars/pepper_deepvariant:r0.4, that can run our variant-calling and polishing pipelines.

Based on your experience, if I only need to run this command to get final SNP? No longer need to run margin and deepvariant sottwares:

docker run \ -v "kishwarshafin-pepper/test":"kishwarshafin-pepper/test" \ kishwars/pepper_deepvariant:r0.7 \ run_pepper_margin_deepvariant call_variant \ -b "${INPUT_DIR}/${BAM}" \ -f "${INPUT_DIR}/${REF}" \ -o "${OUTPUT_DIR}" \ -p "${OUTPUT_PREFIX}" \ -t "${THREADS}" \ --ont_r9_guppy5_sup

In addition, there is another question about the accuracy and number of calls for heterozygous sites in PEPPER-Margin-DeepVariant pipeline,Thanks.

kishwarshafin commented 2 years ago

@lxxiaoxiaLi ,

Based on your experience, if I only need to run this command to get final SNP? No longer need to run margin and deepvariant sottwares:

docker run
-v "kishwarshafin-pepper/test":"kishwarshafin-pepper/test"
kishwars/pepper_deepvariant:r0.7
run_pepper_margin_deepvariant call_variant
-b "${INPUT_DIR}/${BAM}"
-f "${INPUT_DIR}/${REF}"
-o "${OUTPUT_DIR}"
-p "${OUTPUT_PREFIX}"
-t "${THREADS}"
--ont_r9_guppy5_sup

The command you have runs PEPPER-Margin-DeepVariant one after another. It's one command that runs everything, you don't need to run multiple commands to get the final VCF. This run will generate the final VCF for your use.

There are several case studies available that you can use: https://github.com/kishwarshafin/pepper#case-studies-chromosome-20-runs-for-performance-reproducibility

This has commands that you can copy-paste to reproduce the results we reported here.

In addition, there is another question about the accuracy and number of calls for heterozygous sites in PEPPER-Margin-DeepVariant pipeline

We don't benchmark specifically only for heterozygous sites, all numbers reported is compared against the entire truth set.

lxxiaoxiaLi commented 2 years ago

Thanks, I found more variation in nanopore reads than illumina short-reads, Does this have anything to do with the size of the data?If I use the same size of nanopore reads data and illumina short-reads, will it produce the same amount of variation?

The nanopore reads data and illumina short-reads have different base types on the same site, and there's about 1,254,724 sites, are the bases right at these locations, and how do I verify it??

In addition, are all of these recalled variations credible, and what parameters should I filter them with,Thanks.

kishwarshafin commented 2 years ago

@lxxiaoxiaLi ,

This is a very difficult question to answer. Illumina short-reads and ONT are two very different data types. The mapping of reads will have a pretty big difference in terms of downstream variant calls. Also, Illumina's variant caller usually does local realignment which changes the characteristics. And the quality of the variants by these two sequencing technologies will also be very different. One thing you can do is to run hap.py to see the concordance between the two variant calls, visual inspection, in this case, will be difficult to assess.

kishwarshafin commented 2 years ago

@lxxiaoxiaLi , I'm closing this issue for now. If you have further questions, please feel free to re-open the issue.