ultimatesource / denovogear

A program to detect denovo-variants using next-generation sequencing data.
http://www.nature.com/nmeth/journal/v10/n10/full/nmeth.2611.html
GNU General Public License v3.0
49 stars 25 forks source link

even the official testing data does not work #281

Closed jielab closed 6 years ago

jielab commented 6 years ago

Hi,

I now use the testdata from https://github.com/denovogear/testdata/tree/master/sample_CEU and then used the following command: module load denovogear/2018-05-1_6723027 dng dnm auto --ped sample_CEU.ped --bcf sample_CEU.bcf

Then I got the following output. As you see, it ended with a segmentation fault error. I am now really frustrated with this DNG thing. Can someone please help out? Please just simply show me a data and a command line that actually WORKS!!!

=============================== Created SNP lookup table First mrate: 1 last: 1 First code: 6 last: 6 First target string: AA/AA/AA last: TT/TT/TT First tref: 0.0002388 last: 0.99301

Created indel lookup table First code: 6 last: 6 First target string: RR/RR/RR last: DD/DD/DD First prior: 0.05 last: 0.114

Created paired lookup table First target string: AA/AA last: TT/TT First prior 1 last: 1 Segmentation fault

reedacartwright commented 6 years ago

I am unable to replicate the segfault on my system using the command that you provided. I used the latest develop branch version and htslib 1.8. I suspect that it is an issue that is specific to your system.

Please upgrade to the latest version on the develop branch and htslib 1.8 and let me know if the segfault still occurs. If so, can you run gdb and let me know what code triggers the segfault?

jielab commented 6 years ago

Dear Reed:

I think I finally made your example code work. I used your sample_CEU.ped and sample_CEU.vcf file.

First, I think there is an error with your sample_CEU.ped file. The ID highlighted in red should be NA12892 instead, correct?

CEU NA12891_vald-sorted.bam.bam 0 0 1 0

CEU NA12878_vald-sorted.bam.bam NA12891_vald-sorted.bam.bam NA12892_vald-sorted.bam.bam 2 2

CEU NA12891_vald-sorted.bam.bam 0 0 2 0

Please see output pasted below. As I highlighted in yellow below, it seems that there are 3 de novo variants found.

Now, here are my 4 questions:

  1. Is there a way to write out those de novo results in a file separate from the log file?

  2. Why the program still works and the output is the same, when the above pedigree file is apparently wrong, with NA12891 showing up twice and therefore there is no trio?

  3. Why the ALT allele in your sample_CEU.vcf file is almost always “N”, which means missing?

  4. Right now, the genetic data is in PL:DP format, will DNG work if my VCF file only have GT:GP:DS, the 3 fields usually needed for a GWAS analyses?

Thank you very much & best regards,

Jie

dng dnm auto --ped sample_CEU.ped --bcf sample_CEU.vcf

Created SNP lookup table

First mrate: 1 last: 1

First code: 6 last: 6

First target string: AA/AA/AA last: TT/TT/TT

First tref: 0.0002388 last: 0.99301

Created indel lookup table First code: 6 last: 6

First target string: RR/RR/RR last: DD/DD/DD

First prior: 0.05 last: 0.114

Created paired lookup table

First target string: AA/AA last: TT/TT

First prior 1 last: 1

[W::bcf_hdr_check_sanity] GL should be declared as Number=G

DENOVO-SNP CHILD_ID: NA12878_vald-sorted.bam.bam chr: 2 pos: 214668360 ref: G alt: A,N maxlike_null: 3.95324e-12 pp_null: 0.000399465 tgt_null(child/mom/dad): GG/GG/GG s npcode: 1 code: 6 maxlike_dnm: 9.9301e-09 pp_dnm: 0.999601 tgt_dnm(child/mom/dad): AG/GG/GG lookup: 4 flag: 0 READ_DEPTH child: 48 dad: 76 mom: 34 MAPPING_QUALITY child: 59 dad: 59 mom: 59

DENOVO-INDEL CHILD_ID: NA12878_vald-sorted.bam.bam chr: 2 pos: 214668396 ref: G alt: GGC maxlike_null: 1.51281e-26 pp_null: 1.62015e-05 tgt_null(child/mom/dad): DD/RD/RD snpcode: 2 code: 9 maxlike_dnm: 1.25594e-21 pp_dnm: 0.999984 tgt_dnm(child/mom/dad): DD/RR/RR lookup: 3 flag: 0 READ_DEPTH child: 46 dad: 72 mom: 29 MAPPING_QUALITY chi ld: 59 dad: 59 mom: 59

DENOVO-INDEL CHILD_ID: NA12878_vald-sorted.bam.bam chr: 2 pos: 214668400 ref: TC alt: T maxlike_null: 1.51281e-26 pp_null: 1.62015e-05 tgt_null(child/mom/dad): DD/RD/RD snpcode: 2 code: 9 maxlike_dnm: 1.25594e-21 pp_dnm: 0.999984 tgt_dnm(child/mom/dad): DD/RR/RR lookup: 3 flag: 0 READ_DEPTH child: 46 dad: 72 mom: 29 MAPPING_QUALITY chil d: 59 dad: 59 mom: 59

Total number of SNP sites interrogated: 36

Total number of SNP sites passing read-depth filters: 36

Total number of INDEL sites interrogated: 3

Total number of INDEL sites passing read-depth filters: 3

Total number of Paired sample sites interrogated: 0

Total number of Paired sample sites passing read-depth filters: 0

Done !

From: Reed A. Cartwright notifications@github.com Sent: 2018年7月19日 16:22 To: denovogear/denovogear denovogear@noreply.github.com Cc: jiehuang001 jiehuang001@gmail.com; Author author@noreply.github.com Subject: Re: [denovogear/denovogear] even the official testing data does not work (#281)

Closed #281 https://github.com/denovogear/denovogear/issues/281 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/281#event-1743453398 , or mute the thread https://github.com/notifications/unsubscribe-auth/AZsvfydHGU5xYSESBvhTDj4tTDVapw8Jks5uIOpNgaJpZM4VCCBw . https://github.com/notifications/beacon/AZsvf21GKlQEw2TLuW2IqXTngX0b1c4eks5uIOpNgaJpZM4VCCBw.gif