ultimatesource / denovogear

A program to detect denovo-variants using next-generation sequencing data.
http://www.nature.com/nmeth/journal/v10/n10/full/nmeth.2611.html
GNU General Public License v3.0
49 stars 25 forks source link

denovogear no DNM found on CentOS 7 (Core) #294

Closed shannjiang closed 5 years ago

shannjiang commented 5 years ago

Hi,

I am using denovogear developed by you.

My command to run denovogear is:

generating piled up vcf file for dng

bcftools mpileup -f hg19.chr22.fasta S13_02A_CHG000233_chr22.bam S13_02A_CHG000234_chr22.bam S13_02A_CHG000235_chr22.bam | bcftools call -mv -Ov -o S13_02A_chr22_v2.vcf

the first step is working perfectly to generate a piled up vcf file required by dng, I used the command recommended in the website: https://samtools.github.io/bcftools/howtos/variant-calling.html

dng calling

dng dnm auto –ped s1347_trio1_dng.ped –vcf s1347_chr22.vcf –output_vcf s1347_chr22_trio1_dngout.vcf

But no DNM found. It’s the same for other chromosomes (chr1-21). I used other similar DNM calling algorithms, DNM can be found from this trio attached.

Could you please help me to figure out how’s it like this? If there is something wrong with the format of .vcf or .ped files or something else? My OS is CentOS 7 (Core)

Here is my vcf and ped files: example_vcf_ped.zip

Thanks,

Shan

reedacartwright commented 5 years ago

Your vcf is missing DP tag for each sample, causing sites to be removed.

You have two options

  1. Add the tags via bcftools mpileup -a 'AD,DP'
  2. Remove the filter with dng dnm auto -R 0 will remove the filter.
shannjiang commented 5 years ago

Your vcf is missing DP tag for each sample, causing sites to be removed.

You have two options

  1. Add the tags via bcftools mpileup -a 'AD,DP'
  2. Remove the filter with dng dnm auto -R 0 will remove the filter.

Thanks, reedacartwright. I still has a question: Is there any difference between the two options? I mean is the DP involved in the calling of DNM?

reedacartwright commented 5 years ago

I'm pretty sure that DP is not involved in the calling, only used to filter sites that don't pass a threshold.

shannjiang commented 5 years ago

It prompted me invalid option for -R or --R as follows: DeNovoGear v1.1.1 auto: unrecognized option '--R'

My command is: dng dnm auto --R 0 –ped s1347_trio1_dng.ped –vcf s1347_chr22.vcf –output_vcf s1347_chr22_trio1_dngout.vcf

reedacartwright commented 5 years ago

I'm not sure what the option is on v1.1.1. Maybe --rd_cutoff?

shannjiang commented 5 years ago

yes, reedacartwright, the option is --rd_cutoff. I got my first result output, thanks! But I am confused about the result. Here is a truncation of my result:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S13_02A_CHG000235

22 16097738 . C T 0 PASS RD_MOM=43;RD_DAD=18;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ CC/CC/CC:0.999749:3.95324e-05:CT/CC/CC:0.000250843:9.9301e-09:22:-2147483648 22 16101955 . A G 0 PASS RD_MOM=122;RD_DAD=48;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=3;code=9; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AG/AA/GT:0.738956:4.83092e-09:AG/AA/AA:0.261044:9.9301e-09:55:-2147483648 22 16123812 . C T 0 PASS RD_MOM=110;RD_DAD=45;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=3;code=9; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ CT/CC/GT:0.716575:4.83092e-09:CT/CC/CC:0.283425:9.9301e-09:52:-2147483648 22 16139690 . G A 0 PASS RD_MOM=99;RD_DAD=40;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ GG/GG/GG:0.999392:1.25013e-05:AG/GG/GG:0.000608228:9.9301e-09:52:-2147483648 22 16150827 . C T 0 PASS RD_MOM=57;RD_DAD=30;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=2;code=9; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ CT/CT/CC:0.999835:5.99838e-05:CT/CC/CC:0.000165441:9.9301e-09:40:-2147483648 22 16157912 . C T 0 PASS RD_MOM=41;RD_DAD=32;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=2;code=9; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ CT/CT/CC:0.998916:7.55152e-06:CT/CC/CC:0.00108401:9.9301e-09:43:-2147483648 22 16163632 . T C 0 PASS RD_MOM=119;RD_DAD=100;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ TT/TT/TT:0.786873:1.25013e-08:CT/TT/TT:0.213127:9.9301e-09:84:-2147483648 22 16175001 . G A 0 PASS RD_MOM=112;RD_DAD=73;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=2;code=9; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AG/GG/AG:0.935423:1.19684e-07:AG/GG/GG:0.0645767:9.9301e-09:80:-2147483648 22 16244352 . A G 0 PASS RD_MOM=132;RD_DAD=82;MQ_MOM=-2147483648;MQ_DAD=-2147483648;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AA/AA/AA:0.774426:9.9301e-09:AG/AA/AA:0.225574:9.9301e-09:85:-2147483648

Even though all of these variations are listed in the output file, as explained by the header in the output vcf file, ML_DNM is the maximum likelihood of the DNM, so most of the variations listed in the final output are not DNM, right? Because they didn't pass the maximum likelihood of 0.5, which means the reference (ML_NULL) is more likely to be happen? From my understanding, only DNM listed in the output file with ML_DNM larger than 0.5 can be considered potentially DNM.