Open leoisl opened 2 years ago
@mbhall88 could you also please report the max RAM usage? I am quite concerned with this part in denovo racon
: https://github.com/rmcolq/pandora/blob/12a08c5483c19fc12411e174970d31c86e842a2d/src/denovo_discovery/discover_main.cpp#L205-L206
This is a dictionary from loci names to the subreads that map to each locus, inferred by pandora map
. This structure could get potentially very large, as we basically store a substring of every read that map to each locus (is just the region of the read that maps to that specific locus, but still...). There are potentially many better ways to store this info, but I also want to avoid premature optimisation, and just work on this if RAM is indeed an issue.
The nanopore runtime and memory usage of drprg with the previous version of de novo discovery
https://github.com/mbhall88/drprg/issues/15#issuecomment-1275609531
and the Illumina
https://github.com/mbhall88/drprg/issues/15#issuecomment-1276908991
Pandora is the vast majority of runtime and memory in drprg so these act as good benchmark figures, especially given the only thing that will change between them is the pandora version
I just finished running pandora discover
at this commit on the paper ONT data. Comparison with paper run:
I think runtime is totally fine, we got almost the same exact runtime.
RAM usage is much higher though, probably related to this: https://github.com/rmcolq/pandora/issues/303#issuecomment-1297228115 . It will be the RAM bottleneck of the pipeline (pandora compare
on ONT data takes 15.7GB, this new denovo implementation takes 1 GB more).
OFC this is data dependent, so let's see how @mbhall88 benchmarks compare
This RAM use is, IMO, acceptable for the moment - i wouldn't postpone the merge to reduce RAM
Okay, so here are the updated benchmark figures
Median is 91s, down from 98s. So it is faster now on Illumina!
Median is now 171s, up from 163s.
Median is 46MB, down from 54MB, so Memory is marginally down too.
Median is now 270MB, up from 240MB.
All in all, I think this acceptable.
To be done by @mbhall88 . Could you please use this commit?