Closed jrharting closed 3 years ago
I just pushed a commit that adds a new --indel-size
option which allows to override the default value. Some benchmarking should be done before this is made the new default with -X ont
and -X pacbio-ccs
.
Thanks for reporting the issue and the test case.
Thank you very much!
Thanks @pd3, just to double check, the changes are on develop
?
@zeeev That's correct https://github.com/samtools/bcftools/commit/4c71bfbdaf5c3f6bbea2a3c1b38f037af7665d1f
Hi,
I'm trying to use bcftools stats
to call stats on a vcf, and I'm trying to use the --indel-size
flag to capture indels much larger than 60bp.
I'm definitely working from the develop
branch but I'm getting errors when trying to use the --indel-size
flag.
stats: unrecognized option '--indel-size'
git branch is develop
git rev-parse HEAD
gives f2694e52e2af7e60c3129b9ec82f07b5d3e8288b
Any help would be greatly appreciated.
Thanks
This thread is about a different command, mpileup
not stats
. Please open a new issue / feature request so that this is not forgotten.
Hi, my fault, i realise now.
Ive ran bcftools mpileup
on my sam file with the --indel-size
flag, followed by bcftools stats
and plot-vcfstats
, but i find no indels, whereas i've previously found them in the same alignment using other variant calling methods. I do however need to use this package with the --indel-size
flag as I'm interested in indels of all sizes, not just ~60 bp as the standard seems to be for bcftools stats
.
My aim is to align two assembly scaffolds with lastz
, then generate a vcf and call indels. From there i can determine distribution of indel sizes in the alignment.
My process was:
1, Align two assembly scaffolds with LastZ and output a sam file
2, Converted sam to bam and sorted
3, Ran bcftools mpileup --indel-size 1000 -f asm.fa lastz_alignment.sorted.bam > output.lastz.vcf
4, Ran bcftools stats
on the output.lastz.vcf
5, Ran plot-vcfstats
on the output of bcftools stats
I did get some errors when running bcftools stats
:
[W::vcf_parse] INFO 'D' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'Q' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'MQ0' is not defined in the header, assuming Type=String
[W::vcf_parse] FILTER '*' is not defined in the header
My question is why is bcftools
not detecting or reporting and indels?
Sorry if this is not the right place to ask; I will open my own ticket if needs be.
We're looking at ~700bp inserts using PacBio HiFi data using bcftools and we noticed some large indels (>100 or so bp) missing from the callset. I think this is due to the indel window size here https://github.com/samtools/bcftools/blob/develop/bam2bcf_indel.c#L40
Is there any reason not to increase this to something like 400-500bp?
I've attached a toy example i made that models some data I can't share at the moment.
screenshot is generated using:
800_reads.bam.gz reference.fasta.gz