ultimatesource / denovogear

A program to detect denovo-variants using next-generation sequencing data.
http://www.nature.com/nmeth/journal/v10/n10/full/nmeth.2611.html
GNU General Public License v3.0
49 stars 25 forks source link

[bcf_sync] incorrect number of fields (0 != 5) #145

Closed gpcr closed 7 years ago

gpcr commented 8 years ago

samtools mpileup -g -t DP -f hs37d5.fa child.bam father.bam mother.bam -r 9 | dng dnm auto --ped Fam.ped --rd_cutoff 10 --output testFam.vcf --snp_mrate 2e-10 --indel_mrate 1e-11 --bcf -

DeNovoGear v1.1.1-203-g58010a8-dirty

read depth filter: 10 output vcf file: testFam.vcf SNP mutation rate: 2e-10 indel mutation rate: 1e-11 Created SNP lookup table First mrate: 1 last: 1 First code: 6 last: 6 First target string: AA/AA/AA last: TT/TT/TT First tref: 0.0002388 last: 0.99301

Created indel lookup table First code: 6 last: 6 First target string: RR/RR/RR last: DD/DD/DD First prior: 0.05 last: 0.114

Created paired lookup table First target string: AA/AA last: TT/TT First prior 1 last: 1 [mpileup] 3 samples in 3 input files

Set max per-file depth to 2666 **[bcf_sync] incorrect number of fields (0 != 5) at 4113:16256** ... then aborted Is there a way to correct this?
reedacartwright commented 8 years ago

What version of Samtools are you using?

reedacartwright commented 8 years ago

@kaeldai I think we have seen this before, correct?

gpcr commented 8 years ago

Version: 1.2-10-gb7ac1e2 (using htslib 1.2.1-25-gc5521ee)

kaeldai commented 8 years ago

We've had similar problems caused by the way samtools organizes read-groups. Sometimes samtools will create multiple vcf genotype columns per individual and the program has trouble resolving the fields to the pedigree.

Would it be possible to send us the vcf header created by the samtools mpileup command?

gpcr commented 8 years ago
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##samtoolsVersion=1.2-10-gb7ac1e2+htslib-1.2.1-25-gc5521ee
##samtoolsCommand=samtools mpileup -g -t DP -f /data4/gvcf/hs37d5.fa -l /data5/vcr_b37_100bpBuff.bed child.realign.bam father.realign.bam mother.realign.bam
##reference=file:///data4/gvcf/hs37d5.fa
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>
##contig=<ID=GL000207.1,length=4262>
##contig=<ID=GL000226.1,length=15008>
##contig=<ID=GL000229.1,length=19913>
##contig=<ID=GL000231.1,length=27386>
##contig=<ID=GL000210.1,length=27682>
##contig=<ID=GL000239.1,length=33824>
##contig=<ID=GL000235.1,length=34474>
##contig=<ID=GL000201.1,length=36148>
##contig=<ID=GL000247.1,length=36422>
##contig=<ID=GL000245.1,length=36651>
##contig=<ID=GL000197.1,length=37175>
##contig=<ID=GL000203.1,length=37498>
##contig=<ID=GL000246.1,length=38154>
##contig=<ID=GL000249.1,length=38502>
##contig=<ID=GL000196.1,length=38914>
##contig=<ID=GL000248.1,length=39786>
##contig=<ID=GL000244.1,length=39929>
##contig=<ID=GL000238.1,length=39939>
##contig=<ID=GL000202.1,length=40103>
##contig=<ID=GL000234.1,length=40531>
##contig=<ID=GL000232.1,length=40652>
##contig=<ID=GL000206.1,length=41001>
##contig=<ID=GL000240.1,length=41933>
##contig=<ID=GL000236.1,length=41934>
##contig=<ID=GL000241.1,length=42152>
##contig=<ID=GL000243.1,length=43341>
##contig=<ID=GL000242.1,length=43523>
##contig=<ID=GL000230.1,length=43691>
##contig=<ID=GL000237.1,length=45867>
##contig=<ID=GL000233.1,length=45941>
##contig=<ID=GL000204.1,length=81310>
##contig=<ID=GL000198.1,length=90085>
##contig=<ID=GL000208.1,length=92689>
##contig=<ID=GL000191.1,length=106433>
##contig=<ID=GL000227.1,length=128374>
##contig=<ID=GL000228.1,length=129120>
##contig=<ID=GL000214.1,length=137718>
##contig=<ID=GL000221.1,length=155397>
##contig=<ID=GL000209.1,length=159169>
##contig=<ID=GL000218.1,length=161147>
##contig=<ID=GL000220.1,length=161802>
##contig=<ID=GL000213.1,length=164239>
##contig=<ID=GL000211.1,length=166566>
##contig=<ID=GL000199.1,length=169874>
##contig=<ID=GL000217.1,length=172149>
##contig=<ID=GL000216.1,length=172294>
##contig=<ID=GL000215.1,length=172545>
##contig=<ID=GL000205.1,length=174588>
##contig=<ID=GL000219.1,length=179198>
##contig=<ID=GL000224.1,length=179693>
##contig=<ID=GL000223.1,length=180455>
##contig=<ID=GL000195.1,length=182896>
##contig=<ID=GL000212.1,length=186858>
##contig=<ID=GL000222.1,length=186861>
##contig=<ID=GL000200.1,length=187035>
##contig=<ID=GL000193.1,length=189789>
##contig=<ID=GL000194.1,length=191469>
##contig=<ID=GL000225.1,length=211173>
##contig=<ID=GL000192.1,length=547496>
##contig=<ID=NC_007605,length=171823>
##contig=<ID=hs37d5,length=35477943>
##ALT=<ID=X,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h">
##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
kaeldai commented 8 years ago

Do you have the last line of the header, that starts with #CHROM?

gpcr commented 8 years ago
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  child  father  mother
1       30267   .       T       <X>     0       .       DP=1090;I16=1000,54,0,0,39135,1.51229e+06,0,0,0,0,0,0,16405,336829,0,0;QS=3,0;MQSB=1;MQ0F=1     PL:DP   0,255,31:266    0,255,31:302    0,255,31:486
1       30268   .       C       T,<X>   0       .       DP=1123;I16=1034,54,1,0,40287,1.55458e+06,25,625,0,0,0,0,16748,343392,25,625;QS=2.99634,0.003663,0;SGB=-0.556633;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=1        PL:DP   0,255,30,255,33,30:273  0,255,31,255,31,31:316  0,255,31,255,31,31:500
kaeldai commented 8 years ago

The small sample you sent works fine with dnm. Based on some further research the problem may be with samtools being suddenly interrupted during the pileup.

Have you tried running the samtools mpileup command by-itself, sending output to a file, then using the file as dng input?

YulongNiu commented 7 years ago

Hi, I got a similar error message

dng dnm auto --ped trio.ped --bcf dnm_call_chr2.bcf

DeNovoGear v1.1.1

Created SNP lookup table First mrate: 1 last: 1 First code: 6 last: 6 First target string: AA/AA/AA last: TT/TT/TT First tref: 0.0002388 last: 0.99301

Created indel lookup table First code: 6 last: 6 First target string: RR/RR/RR last: DD/DD/DD First prior: 0.05 last: 0.114

Created paired lookup table First target string: AA/AA last: TT/TT First prior 1 last: 1 [bcf_sync] incorrect number of fields (6 != 5) at 17716:0

Total number of SNP sites interrogated: 0 Total number of SNP sites passing read-depth filters: 0 Total number of INDEL sites interrogated: 0 Total number of INDEL sites passing read-depth filters: 0 Total number of Paired sample sites interrogated: 0 Total number of Paired sample sites passing read-depth filters: 0 Done !

MatthewMaher commented 7 years ago

Was this mystery ever resolved?

I just downloaded denovogear (v1.1.1-Linux-x86_64) to try it out, but sadly I'm getting the same error. I would try to investigate further, but I do not really understand what the message means.
My input file is a VCF produced by GATK. I see that denovogear wants BCF so I used bcftools (v1.1) to convert the vcf to bcf, like so:

bcftools view -O b -o input.bcf input.vcf

I understand/think that the message is coming from the library that reads the BCF input, but I've checked and the BCF file seems good because I can perform other operations on it without trouble, including converting it back to VCF, and it has same # of rows, except for a few BCFtools headers that were added. Could the error be an incompatibility between BCF(tools/lib) library between my command line tool and the version built into denovogear?

Here's the first few rows of my ped, in case that could be the problem:

OGI074  OGI074_183  OGI074_185  OGI074_186  2   1
OGI074  OGI074_184  OGI074_185  OGI074_186  2   1
OGI074  OGI074_185  0   0   1   1
OGI074  OGI074_186  0   0   2   1
OGI081  OGI081_000197   OGI081_000200   OGI081_000199   2   2
OGI081  OGI081_000198   OGI081_000200   OGI081_000199   2   2

below is an execution log:

[maherm@login-priv-1-1 denovogear-v1.1.1-Linux-x86_64]$ bin/dng dnm auto --ped my.ped --bcf input.bcf
DeNovoGear v1.1.1

Created SNP lookup table
 First mrate: 1 last: 1
 First code: 6 last: 6
 First target string: AA/AA/AA last: TT/TT/TT
 First tref: 0.0002388 last: 0.99301

Created indel lookup table First code: 6 last: 6
 First target string: RR/RR/RR last: DD/DD/DD
 First prior: 0.05 last: 0.114

Created paired lookup table
 First target string: AA/AA last: TT/TT
 First prior 1 last: 1
[bcf_sync] incorrect number of fields (0 != 5) at 1096759097:3031903

Total number of SNP sites interrogated: 0
Total number of SNP sites passing read-depth filters: 0
Total number of INDEL sites interrogated: 0
Total number of INDEL sites passing read-depth filters: 0
Total number of Paired sample sites interrogated: 0
Total number of Paired sample sites passing read-depth filters: 0
Done !
jgarciamesa commented 7 years ago

This bug had to do with samtools and an upgrade on the VCF format version. That happened after the release of v1.1.1. However, on the current develop branch this has been fixed. I would suggest to clone the repository and use that branch, as that and other bugs have been already fixed. Installing instructions can be found here: https://github.com/denovogear/denovogear#installation

jgarciamesa commented 7 years ago

Fixed in develop.