varlociraptor / varlociraptor

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
https://varlociraptor.github.io
GNU General Public License v3.0
122 stars 16 forks source link

preprocessing poly G #402

Open delpropo opened 1 year ago

delpropo commented 1 year ago

I have multiple files from delly for preprocessing of WGS data which would not finish after more than a week. I found that the specific region is a large poly G stretch of at least 70nt around chromosome 2, position 32916300.

Human hg38 chr2:32916233-32916344 UCSC Genome Browser v455

johanneskoester commented 8 months ago

Thanks for reporting, and sorry for the delay! What candidate variants does delly give you within that region?

delpropo commented 8 months ago

Here is a bcf file that failed. The longest lines are duplications and inversions.
200046.delly.565-of-2048.filtered.vcf.txt

The region was in a "blacklisted" region.
chr2 32916201 32916632

https://github.com/Boyle-Lab/Blacklist/?tab=readme-ov-file

In the end, I had to remove some other regions as well prior to preprocessing.