Closed splaisan closed 7 years ago
I just added a support for IUPAC ambiguity codes in the reference for this particular case.
Unless I missed something, installing a latest git bcftools over my build did not solve it for me
/opt/biotools/bin/bcftools --version
bcftools 1.6-14-geed5371
Using htslib 1.6-13-g7c5083b
Copyright (C) 2016 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ bcftools stats --fasta-ref Ref.fasta variants.vcf.gz > vcf.stats
Sanity check failed, the reference sequence differs: NC_030973.1:129900+2 .. G vs Y
Thanks for your feedback
Does the command bcftools --version
give the same answer as /opt/biotools/bin/bcftools --version
?
If it does, could you please provide a small test case to reproduce the problem?
yep ;-) could it be that the test you added do not include G vs Y?
$ bcftools --version bcftools 1.6-14-geed5371 Using htslib 1.6-13-g7c5083b Copyright (C) 2016 Genome Research Ltd. License Expat: The MIT/Expat license This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
$ which bcftools /opt/biotools/bin/bcftools
$ /opt/biotools/bin/bcftools --version bcftools 1.6-14-geed5371 Using htslib 1.6-13-g7c5083b Copyright (C) 2016 Genome Research Ltd. License Expat: The MIT/Expat license This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
Sorry, I should have picked that up already from the error message. Of course it does not because Y={C,T}. The program does the right thing when it complains here.
I agree but still find this error strange because the ref base at that position is a 'Y' so why calling this insanity! unless the first call ending with G is the cause and be more of a varscan error? the exact same fasta was used to make the BWA index and call variants I am lost here :-(
samtools faidx R64_SEUB3.0_merged.fasta NC_030973.1:129900-129902
NC_030973.1:129900-129902 CAY
$ tabix variants.vcf.gz NC_030973.1:129900-129902 NC_030973.1 129900 . CAG C . PASS ADP=22;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:49:33:22:10:13:48.15%:1.1242E-5:23:12:6:4:3:10 NC_030973.1 129902 . Y A . PASS ADP=23;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:41:33:23:0:8:47.06%:7.77E-5:0:17:0:0:6:2
The VCF specification requires that the REF base is one of A,C,G,T,N and that it matches the fasta reference. Callers can use fasta files with IUPAC codes, but they have to output the proper REF base in the VCF. In case there is Y in the fasta, the REF base in the VCF could be in principle either C or T, but not G. In fact, the specification requires that the alphabetically first base must be used, so in your case the first record should have REF=CAC and the second C.
which then means that varscan was guilty after all and that I will never be able to get stats for this vcf data except after correcting one by one each sanity error. Would there be a way to run bcftools sats in a lenient manner with an extra argument (-f !) to ignore sanity errors?
Thanks a lot Peter for your answers and help. Stephane
running bcftools stats -c both --fasta-ref $REF $VCF > $VCF.stats fails although the fasta reference is the same used for mapping and calling the vcf. looks like a IUPAC bug to me! Is there a way to bypass sanity and get results from this command (now dies empty)? Thanks
Sanity check failed, the reference sequence differs: NC_030973.1:129867+1 .. T vs W