statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
117 stars 25 forks source link

FATAL ERROR - [E:int32_t main(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 1 before 10, but VCF/BCF file has 1 after 10 terminate called after throwing an instance of 'pException' what(): Exception was thrown Aborted (core dumped) #36

Open AhmedArslan opened 5 years ago

AhmedArslan commented 5 years ago

Can please someone help me with this issue, causes, and cures/solutions?

This is my commend:

./demuxlet/demuxlet --sam /home/outs/possorted_genome_bam.bam --vcf /home/new.vcf --field GT --min-mac 10 --min-uniq 4 --out C24

Available Options

The following parameters are available. Ones with "[]" are in effect: Options for input SAM/BAM/CRAM : --sam [/home/outs/possorted_genome_bam.bam], --tag-group [CB], --tag-UMI [UB] Options for input VCF/BCF : --vcf [ /home/new.vcf ], --field [GT], --geno-error [0.01], --min-mac [10], --min-callrate [0.50], --sm, --sm-list Output Options : --out [C24], --alpha, --write-pair, --doublet-prior [0.50], --sam-verbose [1000000], --vcf-verbose [10000] Read filtering Options : --cap-BQ [40], --min-BQ [13], --min-MQ [20], --min-TD, --excl-flag [3844] Cell/droplet filtering options : --group-list, --min-total, --min-uniq [4], --min-snp

Run with --help for more detailed help messages of each argument.

NOTICE [2019/02/22 16:38:17] - Finished identifying 6667 samples to load from VCF/BCF

FATAL ERROR - [E:int32_t main(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 1 before 10, but VCF/BCF file has 1 after 10

terminate called after throwing an instance of 'pException' what(): Exception was thrown Aborted (core dumped)

hyunminkang commented 5 years ago

You need to make your VCF to follow the same order of chromosomes. Use bcftools and/or tabix to do so.

Hyun.

On Fri, Feb 22, 2019, 7:42 PM Ahmed Arslan notifications@github.com wrote:

Can please someone help me with this issue, causes, and cures/solutions?

This is my commend:

./demuxlet/demuxlet --sam /home/outs/possorted_genome_bam.bam --vcf /home/new.vcf --field GT --min-mac 10 --min-uniq 4 --out C24

Available Options

The following parameters are available. Ones with "[]" are in effect: Options for input SAM/BAM/CRAM : --sam [/home/outs/possorted_genome_bam.bam], --tag-group [CB], --tag-UMI [UB] Options for input VCF/BCF : --vcf [ /home/new.vcf ], --field [GT], --geno-error [0.01], --min-mac [10], --min-callrate [0.50], --sm, --sm-list Output Options : --out [C24], --alpha, --write-pair, --doublet-prior [0.50], --sam-verbose [1000000], --vcf-verbose [10000] Read filtering Options : --cap-BQ [40], --min-BQ [13], --min-MQ [20], --min-TD, --excl-flag [3844] Cell/droplet filtering options : --group-list, --min-total, --min-uniq [4], --min-snp

Run with --help for more detailed help messages of each argument.

NOTICE [2019/02/22 16:38:17] - Finished identifying 6667 samples to load from VCF/BCF

FATAL ERROR - [E:int32_t main(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 1 before 10, but VCF/BCF file has 1 after 10

terminate called after throwing an instance of 'pException' what(): Exception was thrown Aborted (core dumped)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/36, or mute the thread https://github.com/notifications/unsubscribe-auth/AF-OuVm9bbLY1Xzk0X_mYiOSOUf7qW_Xks5vQI5kgaJpZM4bNGom .

AhmedArslan commented 5 years ago

Thanks, The bam file is straight from 10x cellranger, how I know it is has the same chromosome order as my vcf file after bcftools sorted?

MattPM commented 5 years ago

@AhmedArslan You likely figured this out but if anyone else has this issue, the part of the script that creates that error is reading the vcf header. If you use

bcftools view

to sort chromosomes lexicographically (1,10,11,12) in the VCF to match the output cellranger creates, it may appear that the chromosomes are in the correct order for example:

awk '{print $1}' myvcf.vcf | uniq

would print 1 10 11 ect.

However, you need to also be certain the header which lists the chromosome order / positions also matches the same order above.

ocqub commented 4 years ago

@MattPM

However, you need to also be certain the header which lists the chromosome order / positions also matches the same order above.

Would you be able to give any suggestions on how the numerical chromosome order (1, 2, 3 etc) in the VCF header can be changed into lexicographical order (1, 10, 11 etc)?

I've tried manually re-ordering the contig IDs into a text file with the chromosomes in the desired order, e.g.;

contig=

contig=

contig=

contig=

then appending these values to the existing header with

bcftools annotate -h reheader.txt -o output.vcf -O v input.vcf

however when it check the header after this, the new entries have returned to numerical ordering, like;

contig=

contig=

contig=

contig=

MattPM commented 4 years ago

@ocqub look into: http://samtools.github.io/bcftools/bcftools.html#reheader

ocqub commented 4 years ago

@MattPM

http://samtools.github.io/bcftools/bcftools.html#reheader

This seems to be exactly what I needed, thanks so much!

XiaofeiSunUCSF commented 4 years ago

@MattPM

http://samtools.github.io/bcftools/bcftools.html#reheader

This seems to be exactly what I needed, thanks so much!

@ocqub I got the same issue. Would you like to share your code to rehead the VCF file? Thank you in advance Best, Xiaofei

ocqub commented 4 years ago

@XiaofeiSunUCSF It's been a while since running it, so I have to refer to the very ad hoc notes I took. I used reheader as follows;

bcftools reheader -h reheader.txt -o sorted_filtered.vcf sorted_filtered_final.vcf

Where reheader.txt is an edited header containing the new header format I required (I used a basic text editor to change some things), sorted_filtered.vcf is the input file, and sorted_filtered_final.vcf was the output file with updated header.

However, as I recall, this method led me to other problems downstream and was not ideal. This is possibly not the answer you wanted to hear, but in the end the easiest way for me was to re-align my bulk RNAseq data against the same reference genome that I used for my single-cell RNAseq analysis, and then to do variant calling on the bulk RNA with this reference genome too. That ensured the chromosomes in the VCF and my cellranger BAM file were ordered identically, and demuxlet did not throw any errors. It saved me lots of file format manipulation.

Note: after re-aligning with the same reference, I still had to use bcftools reheader, this time to change the sample name in each of my sample-specific VCF files. I ran;

bcftools reheader -s g1.txt -o G1_reheader.bc G1_calls.bc

Where the g1.txt file is a file containing the new sample name I required. In this case however, the rest of the header was already in the desired format.

Hope this points you in the right direction!

insilicolife commented 3 years ago

Can anyone share what exactly the "reheader.txt" headers look like?

Marwansha commented 11 months ago

i got the same error and i would like to know if i need to reorder only the header withi the vcf file to match the bam or also the order of the genotyping data too

Thanks

ocqub commented 11 months ago

@Marwansha I suggest generating everything using the same genome.fa reference genome, namely single-cell and bulk RNA alignment. This prevents conflicting chromosome name/order in downstream analysis.