ultimatesource / denovogear

A program to detect denovo-variants using next-generation sequencing data.
http://www.nature.com/nmeth/journal/v10/n10/full/nmeth.2611.html
GNU General Public License v3.0
49 stars 25 forks source link

dng call --model=autosomal does not return results #286

Closed jielab closed 6 years ago

jielab commented 6 years ago

Hi,

Now I could use "dng dnm auto --ped sample.ped --bcf trio.bcf" to identify de-novo variants from a TRIO dataset. I feel that this type of task could be done by a simple text processing tool, basically to identify variants whose parents' genotype are both A/A, while the proband's genotype is A/a or a/a, correct?

Then I try to see how I could identify autosomal dominant variants from this same trio, after i changed the case status to "2" for both the proband and the father in the sample.ped file. This time, I run "dng call --model=autosomal --ped sample.dom.ped trio.bcf", but surprisingly, the program returns a VCF file without any variants. However, there are a lot of variants where the father's genotype and the proband's genotype are both A/a while the mother's genotype is A/A. Why those variants are not picked up by my command "dng call --model=autosomal --ped sample.dom.ped trio.bcf"?

Thank you & best regards, Jie

reedacartwright commented 6 years ago

Dear @jiehuang001,

As I've said before, it is very difficult for me to diagnose your issues without any information about the ped and bcf files that you are using.

I feel that this type of task could be done by a simple text processing tool, basically to identify variants whose parents' genotype are both A/A, while the proband's genotype is A/a or a/a, correct?

While you can call de novo mutations using a simple text processing tool, that will lead to a lot of issues as it will ignore the uncertainty of genotype information. DNG provides more accurate de novo mutation calling because integrates information about uncertainty and experimental designs into its genotype models.

Then I try to see how I could identify autosomal dominant variants from this same trio, after i changed the case status to "2" for both the proband and the father in the sample.ped file. This time, I run "dng call --model=autosomal --ped sample.dom.ped trio.bcf", but surprisingly, the program returns a VCF file without any variants. However, there are a lot of variants where the father's genotype and the proband's genotype are both A/a while the mother's genotype is A/A. Why those variants are not picked up by my command "dng call --model=autosomal --ped sample.dom.ped trio.bcf"?

Without information about your .ped and .bcf files, I can't be certain, but I believe from your description that you are trying to do something that DNG doesn't do, and are using a bad ped file. DNG doesn't know anything about phenotypes and does not use any information about dominance or case status. The fact that you appear to have a column in your ped file for case status, indicates to me that your ped file is not following the dng call's PEDNG format which doesn't have columns for case status or other phenotypes. More than likely, you are getting no results because dng call cannot connect the individuals in your .ped file to the samples in your .bcf file.

jielab commented 6 years ago

Dear Reed:

Thank you very much for your reply!

I did indicate that I am now using the testing data from your DNG website https://github.com/denovogear/testdata/tree/master/sample_CEU. I simply use the https://github.com/denovogear/testdata/blob/master/sample_CEU/sample_CEU.ped file and the sample_CEU.vcf file, so that it is easier for us to communicate and cross-check.

Your sample_CEU.ped file does have 6 columns. I assume that the 6th column is the case/control status column, the same as the PLINK .ped file. For your sample_CEU.ped file, the 6th column for the proband has a value of 2, while the two parent have a value of 0. I think the value for parents should be 1 instead, if DNG uses the same format as PLINK.

I understand that DNG can pick up potentially de-novo mutation from the sample_CEU.vcf file, in a way more powerful than simply running a text comparison, because it takes into account uncertainty of genotyping/sequencing. What I am trying to ask now is: can DNG pick up heritable mutation? For example, if the father and the proband in your sample_CEU.ped have a rare disease, how can I use DNG to pick up genetic variants that are inherited in a dominant mode from father to son, i.e, both father and son are Aa while mother is AA? Or this is not something that DNG could do, instead, I need to use something like fBAT or PLINK TDT test?

Thank you very much & best regards,

Jie

From: Reed A. Cartwright notifications@github.com Sent: 2018年8月8日 19:27 To: denovogear/denovogear denovogear@noreply.github.com Cc: jiehuang001 jiehuang001@gmail.com; Mention mention@noreply.github.com Subject: Re: [denovogear/denovogear] dng call --model=autosomal does not return results (#286)

Dear @jiehuang001 https://github.com/jiehuang001 ,

As I've said before, it is very difficult for me to diagnose your issues without any information about the ped and bcf files that you are using.

I feel that this type of task could be done by a simple text processing tool, basically to identify variants whose parents' genotype are both A/A, while the proband's genotype is A/a or a/a, correct?

While you can call de novo mutations using a simple text processing tool, that will lead to a lot of issues as it will ignore the uncertainty of genotype information. DNG provides more accurate de novo mutation calling because integrates information about uncertainty and experimental designs into its genotype models.

Then I try to see how I could identify autosomal dominant variants from this same trio, after i changed the case status to "2" for both the proband and the father in the sample.ped file. This time, I run "dng call --model=autosomal --ped sample.dom.ped trio.bcf", but surprisingly, the program returns a VCF file without any variants. However, there are a lot of variants where the father's genotype and the proband's genotype are both A/a while the mother's genotype is A/A. Why those variants are not picked up by my command "dng call --model=autosomal --ped sample.dom.ped trio.bcf"?

Without information about your .ped and .bcf files, I can't be certain, but I believe from your description that you are trying to do something that DNG doesn't do, and are using a bad ped file. DNG doesn't know anything about phenotypes and does not use any information about dominance or case status. The fact that you appear to have a column in your ped file for case status, indicates to me that your ped file is not following the dng call's PEDNG format which doesn't have columns for case status or other phenotypes. More than likely, you are getting no results because dng call cannot connect the individuals in your .ped file to the samples in your .bcf file.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/286#issuecomment-411585970 , or mute the thread https://github.com/notifications/unsubscribe-auth/AZsvf9b4mlTYpsVTrO-59mu3kr1tWIDOks5uO3PDgaJpZM4Vu1wt . https://github.com/notifications/beacon/AZsvf8daaV_-61micUhvS3pgbU0UBXSuks5uO3PDgaJpZM4Vu1wt.gif

reedacartwright commented 6 years ago

I did indicate that I am now using the testing data from your DNG website https://github.com/denovogear/testdata/tree/master/sample_CEU. I simply use the https://github.com/denovogear/testdata/blob/master/sample_CEU/sample_CEU.ped file and the sample_CEU.vcf file, so that it is easier for us to communicate and cross-check.

I'm sorry for being confused, as that information was missing from this issue. The sample_CEU is test data for dng dnm and is not compatible with dng call. The data I use to test dng call is in the human_trio directory.

Your sample_CEU.ped file does have 6 columns. I assume that the 6th column is the case/control status column, the same as the PLINK .ped file. For your sample_CEU.ped file, the 6th column for the proband has a value of 2, while the two parent have a value of 0. I think the value for parents should be 1 instead, if DNG uses the same format as PLINK.

dng dnm ignores the 6th column. It won't throw an error, but it doesn't use that information.

What I am trying to ask now is: can DNG pick up heritable mutation?

You can use dng call --all to output sites that contain segregating variants in addition to sites that contain de novo variants. However, what you are trying to do is find potentially causal variants via linkage analysis. For that you do need to use a tool like PLINK.

jielab commented 6 years ago

really frustrating with this tool and the messages. I have to decide that i have to quit this.

what i have been struggling to do is to simply find de novo and rare heritable mutation from a trio VCF file. but after so many emails, now you tell me to use testing data in human_trio directory instead of the sample_CEU directory, to use "dng call --all" instead of "dng call" instead of "dng dnm", to use other tools such as PLINK instead of DNG.

wasted too much of my time. nothing makes sense. sorry to say this. i really wish that you guys develop something nicer...