samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
672 stars 240 forks source link

Output from bcftools consensus in .fq format #1670

Open elizeng opened 2 years ago

elizeng commented 2 years ago

I am trying to run PSMC to obtain demographic history, and am looking to call consensus sequences from my study samples.

PSMC requires a ".fq.gz" input which is typically the whole-genome diploid consensus sequence of an individual.

The old pipeline, that is now deprecated that was recommended for generating the input file is: samtools mpileup -C50 -uf ref.fa aln.bam | bcftools view -c - | vcfutils.pl vcf2fq -d 10 -D 100 | gzip > diploid.fq.gz

I have adapted my pipeline to use bcftools consensus instead but the pipeline seems to only output .fa files and not .fq files.

Is it possible to output a .fq file instead of only a .fa format.

Thanks!

pd3 commented 2 years ago

I will mark this as a feature request, but I am afraid someone would have to contribute the code, I don't have the capacity to add it myself at the moment.

jkbonfield commented 2 years ago

The vcfutils.pl command is still part of the latest distribution, although it's more on "life support" than supported. I haven't tried to see if it still works, but it may do so feel free to give it a whirl. As you say however, bcftools consensus is the supported method now. Note you should now be using bcftools mpileup instead of samtools mpileup, but the output is basically the same.

It's also worth exploring the new samtools consensus -f fastq aln.bam command which may be able to replace all these steps. It's not as advanced as a fully feature variant caller, so sometimes events may be missed (although it sometimes finds real variants that bcftools doesn't), but it's worth testing as it's considerably faster and for good quality data of sufficient depth it's reliable. (I would be interested in any places you find where samtools consensus is giving incorrect results on good quality data.)

Sonalhooda commented 1 month ago

I am trying to run PSMC to obtain demographic history, and am looking to call consensus sequences from my study samples.

PSMC requires a ".fq.gz" input which is typically the whole-genome diploid consensus sequence of an individual.

The old pipeline, that is now deprecated that was recommended for generating the input file is: samtools mpileup -C50 -uf ref.fa aln.bam | bcftools view -c - | vcfutils.pl vcf2fq -d 10 -D 100 | gzip > diploid.fq.gz

I have adapted my pipeline to use bcftools consensus instead but the pipeline seems to only output .fa files and not .fq files.

Is it possible to output a .fq file instead of only a .fa format.

Thanks!

How did you ran PSMC facing a similar situation. Did you find any alternative ?