luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
305 stars 38 forks source link

very slow call set refinement with refcall #239

Open nemartins opened 2 years ago

nemartins commented 2 years ago

The call set refinement step of octopus (v0.7.4 or development) when using the refcall option runs very slowly (50min with a 1.5Mb reference genome at 50X coverage).

The initial calling is very quick, around 1 minutes, but then stalls. I've tried to play with the available memory (increasing or decreasing the -B value) and to disable filtering, but the issue remains.

Without refcall, the full run finishes in about 2 min.

Do you have an idea what's happening?

dancooke commented 2 years ago

Hi, this is unfortunately a known performance bug - the issue is, as part of filtering, octopus realigns all reads to called haplotypes, which is fine for typical whole-genome variant call sets, but becomes very expensive when including reference calls. It's on my TODO list to find a workaround for this.