statgen / popscle

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
https://github.com/statgen/popscle/wiki
Apache License 2.0
43 stars 15 forks source link

matching chromosome order in vcf and bam files #42

Open dalhoomist opened 3 years ago

dalhoomist commented 3 years ago

Hello,

I am using popscle dsc-pileup function as a prelude for running demuxlet on cellranger output. Here is my code: packages/popscle/bin/popscle dsc-pileup --sam bam.bam --vcf vcf1.vcf --out home/workscape/TB3966/pileup --group-list barcodes.tsv

I downloaded a vcf from the 1000genomes to use here. The problem is that pipeline is throwing the following error: [E:int32_t cmdCramDscPileup(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 19 before 2, but VCF/BCF file has 19 after 2

The order of chromosomes in the vcf normally started from 1, 2, 3 ... etc. In the bam file it is 1 10 11 12 ....

Is there a way to fix that?

Thanks for all the help.

raivivek commented 3 years ago

Hi @dalhoomist You can reorder your VCF file corresponding to the BAM using picard SortVcf command. For example, picard SortVcf INPUT=example.vcf OUTPUT=example.sorted.vcf SEQUENCE=sort_by.bam. Once you've the sorted VCF, you can use it with demuxlet and the sorting order will match for all BAMs processed similarly (i.e., aligned to the same reference).

fatimasnc commented 2 years ago

Hi,

I'm having the same problem here. I tried @raivivek's suggestion but after running picard SortVcf on my VCF file I still get the same error:

FATAL ERROR - 
[E:int32_t main(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 19 before 2, but VCF/BCF file has 19 after 2

With help from a colleague we figured out the metadata at the top of the VCF file is not sorted, and apparently other software can handle this (having different order in the metadata and each variant entry) but it doesn't seem the case for demuxlet. Is there any way to fix this that wouldn't involve manually moving around the metadata entries for each chromosome?

Thanks!

raivivek commented 2 years ago

Hi @fatimasnc -- you're right on. Perhaps I should have mentioned this earlier as well, but certainly don't have to do it manually. Picard offers several other commands (such as CreateSequenceDictionary, UpdateVcfSequenceDictionary, and SortVcf) that allow you to extract the chromosome order from the reference and reorder VCF without having to do it manually — the approach I took finally. You might however need to create a script or wrapper that executes these few steps for a given set of VCF/Reference FASTA/BAM files.

fatimasnc commented 2 years ago

Thanks a lot for your reply @raivivek! In the end we managed to solve it with this helper script from the aerts lab :)

hsymoon commented 2 months ago

Thanks a lot for your reply @raivivek! In the end we managed to solve it with this helper script from the aerts lab :)

Hi, I have read your conversation and thanks for you providing the helper script. But I do not know hou to run this sort_vcf_same_as_bam.sh .Could you please provide an example? Thanks so much for helping.