Closed wk1267 closed 2 years ago
This is caused by the unfortunate naming of your contigs.
bcftools consensus
accepts fragments of fasta files on input which is handy when one needs consensus of, say, a list of genes. In such cases, the contig names in the fast file would be named >chr:beg-end
. Now you see the problem. In your case the :beg-end
is part of the contig name and thus the contig name is interpreted as ref|NC_045512.2|
.
The easiest solution is to drop the :1-29903
part from the name. Not sure how to solve this best on bcftools side. Maybe by adding an option forbidding streaming of contig fragments.
Another possibility would be to add a check to confirm if ref|NC_045512.2|:1-29903
or ref|NC_045512.2|
is the contig name. However, this would fail if the VCF contained both ref|NC_045512.2|:1-29903
and ref|NC_045512.2|
.
The file formats group spent a while thinking about these issues, and there is useful advice in Appendix A of the SAM spec. I think some of the HTSlib API functions implement the ideas there too — namely the “confirm which one is a contig name” check @pd3 mentions and the curly bracket syntax.
The appendix suggests what I proposed, the ambiguous case highlighted above would remain unresolved. The curly bracket syntax is good for use with command line interface, but not in the fasta file headers. Your best bet @wk1267 is to rename the contigs, for example replacing the colon with an underscore.
Thank you for your responses! I have renamed the contigs using bcftools annotate
. This was able to solve the problem.
Hi, I am trying to write a pipeline that does lofreq variant calling on a bam file sample, then get the consensus sequence from the vcf file generated. Lofreq results show that substitution tests are performed, indicating that there should be variants called.
But, when I do consensus calling, I encounter the following warning message. (This ends up giving me the original ref.fa file).
Please advise. Thank you!
For reference, this is the code I've used:
Running
bcftools query -f'%CHROM:%POS %REF %ALT\n' ERR6671299.vcf.gz
gives me:Meanwhile, my ref.fa begins with:
>ref|NC_045512.2|:1-29903 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
These are the versions used: bcftools 1.15.1 Using htslib 1.15.1