ksahlin / ultra

Long-read splice alignment with high accuracy
60 stars 10 forks source link

Controlling (high) uLTRA RAM usage #15

Open pre-mRNA opened 2 years ago

pre-mRNA commented 2 years ago

Hi,

I'm running uLTRA on a cluster with 48 CPU cores and 196 GB RAM per node.

If I call uTLRA like this to align direct RNA reads to an indexed mammalian genome:

uLTRA align "${genome}" "${reads}" "${output_directory}" --index "${ultraIndex}" --ont --t 48

I notice that many of the uLTRA subprocesses use more RAM than is available and I get RAM errors for slaMEM etc, resulting in failure of the job.

The current workaround is to limit the number of CPUs used so that I don't exceed max node RAM.

Is there any way to control uLTRA maximum memory usage, either per CPU core or overall?

Thanks!

ksahlin commented 2 years ago

Hi @pre-mRNA,

No solution that would yield identical results. Reducing the number of cores seems like the best option.

Another alternative is to use the parameter --use_NAM_seeds (based on strobemer seeds), which has a fixed peak memory roughly equivalent to using uLTRA with 18 cores (at least on human genome) - and this option is also faster. Then you should be able to specify --t 48.

However, this parameter does not guarantee identical alignments. I observed that uLTRA's alignments with --use_NAM_seeds were a tiny bit worse than default on the datasets I tried with. The --use_NAM_seeds uses StrobeMap instead of slaMEM to find matches. StrobeMap is automatically installed if you installed uLTRA through conda. For details about --use_NAM_seeds, see section "New since v0.0.4" in the readme.