stjudecloud / workflows

Bioinformatics workflows developed for and used on the St. Jude Cloud project.
MIT License
34 stars 10 forks source link

Tasks that take both pos/name sorted probably need different resources #170

Open a-frantz opened 3 months ago

a-frantz commented 3 months ago

Major culprit here is HTSEQ: https://github.com/stjudecloud/workflows/blob/main/tools/htseq.wdl

It has a pretty terrible sort algorithm and eats up resources when the input is position sorted. We've exposed the name sort option but still allocate a large amount of memory and disk. Neither are likely needed.