ultimatesource / denovogear

A program to detect denovo-variants using next-generation sequencing data.
http://www.nature.com/nmeth/journal/v10/n10/full/nmeth.2611.html
GNU General Public License v3.0
49 stars 25 forks source link

sample size for each single run of denovogear #293

Closed shannjiang closed 5 years ago

shannjiang commented 5 years ago

I am wondering if I can include more than three samples in the ped file? Because in our study every family has more than one child. Or if I can incorporate multiple families into one ped file and do a single denovogear run? Because the first column is family ID I think denovogear can distinguish the different families.

If I can include more than three samples in the ped file, can the denovogear output file tell me which DNM is for which child?

thanks,

Shan

reedacartwright commented 5 years ago

I believe that if you pass a pedigree file to dng dnm it will process all possible trio. However, vcf output is not supported if you do more than 1.

Alternatively, you can use dng call which can do joint calling on large pedigrees, allowing you to poll information across children and improving your power.

shannjiang commented 5 years ago

Anyway, the sample size in .ped file can be less than in .vcf, right? As long each individual of .ped is in .vcf. If it is so, denovogear only calls DNM in the samples of .ped file, right?

I am just wondering if I still need to regenerate the .vcf file trio by trio.

reedacartwright commented 5 years ago

I don't actively use dng dnm. I believe that you can have everyone in a single VCF. And dnm will only select the columns that it needs.

tatianaliu commented 5 years ago

we've got a similar issue of this one, we called de novo variants by dnm, we tried two runs, one is VCF input includes all samples, the other one is VCF input excludes other samples except for the trios(3 samples). They both form a same pipeline and having the same genotypes, likelihood etc. the only difference is the size of the sample, then we got a different output from these two runs. the 3-sample VCF input gave us more output than the all-sample VCF input, we would like to know how that happened?