samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
663 stars 240 forks source link

bcftools mpileup call with empty output #2069

Open valentinaOpazo opened 9 months ago

valentinaOpazo commented 9 months ago

Hello, I'm interesting in identify if my sample has a insertion, deletion or if it's heterozygote (in a specific region). To do this I ran the next code bcftools mpileup -Ou -r chrX:start-end -f genome.fa $Input_Path/input.bam | bcftools call -Ou -mv -o test_option3

When I run it I don't get any error messages. The following are the terminal messages.

Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 25

The problem is my output file test_option3 contain only the header. The last rows are:

INFO=##INFO=

INFO=

INFO=

bcftools_callVersion=1.19-3-gd6f535f1+htslib-1.19-1-g61b037bb

bcftools_callCommand=call -cv; Date=Fri Jan 5 13:47:43 2024

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TCGA-A2-A0YF-01A-31-A615-42-X007-S01_aliquot

I also tried to execute the same analysis in Galaxy and I got the same output file, Is it possible that the error was on my bam file? How can I test it?

pd3 commented 9 months ago

Check if there are any reads in that region

samtools view your.bam chrX:start-end | less

A common problem is a mismatch in chromosome naming convention (chrX vs X) between bam, fasta reference or the user.

valentinaOpazo commented 9 months ago

Thanks for your quick answer, I appreciate it a lot! I checked if there are reads in the region and there are (below is a screenshot of the output of samtools view)

image

Also, bam files and reference use the same chromosome convention name. So I'm still having trouble with the output bcf file

pd3 commented 9 months ago

The next thing to check then is what raw mpileup looks like. Maybe all positions in that region are non-variant, therefore bcftool call -v removes everything? You can test with

bcftools mpileup -r chrX:start-end -f genome.fa file.bam | less

or

bcftools mpileup -r chrX:start-end -f genome.fa file.bam | bcftools call -mA | less
valentinaOpazo commented 8 months ago

You are right! When I removed -v option on bcftools call, the output isn't empty anymore.
However, I don't completely understand the output. When do you say that region are non-variant, what does it mean? I'm analyzing one sample per run code, so does it mean that my sample is equal to the reference genome? Below is one output file

image

In another sample, I got an output file that looks like the screenshot below. In this case, the interpretation must be that It has a deletion of 5 bases? image

pd3 commented 8 months ago

In the first screenshot all bases in all reads are identical to the reference, hence there are no variants, nothing to call.

The second screenshot shows several bases with no coverage. There seem to be no overlapping reads, therefore the five positions with zero coverage are not called as deletion. It is possible there was a read with an indel and was filtered by mpileup (eg because of the --min-ireads option), but this I cannot tell just from seeing the screenshot.

pd3 commented 4 months ago

This requires a test case to reproduce and debug the problem. This script can be used to create a small slice of the bam and the reference https://github.com/pd3/mpileup-tests/blob/main/misc/create-bam-test