iqbal-lab-org / pandora

Pan-genome inference and genotyping with long noisy or short accurate reads
MIT License
109 stars 14 forks source link

Logging: new optimisations and 4-way results #330

Closed leoisl closed 1 year ago

leoisl commented 1 year ago

Description

This is just a logging for the new upcoming PR that will have 2 major changes in pandora:

  1. Lazy loading of PRGs (improves RAM significantly in the plasmid/roundhound use case);
  2. Do not keep read data (e.g. all minimiser hits each read has, all PRGs it maps to, etc...), but process this data as early as possible and release the memory. Improves RAM significantly in all cases;

And also minor changes:

  1. Code cleanup, removing unused gene DBG and noise filtering modules;
  2. If the best mapping of a read is to several graphs, we choose one at random (before was deterministic, so a single graph would get all the mappings)

Results

The major changes should not impact results as they are just RAM improvements. The multimapping improvement should change the results slightly, but hopefully for better. To check if any breaking bug was added, we ran this version against the most updated prerelease on the 4way pipeline. In general, the new version is slightly better precision-wise without denovo, and the old version is slightly better precision-wise with denovo. The differences are however small. RAM improvements are massive and will be detailed in a later post. This will enable pandora to be run with far less computational resources, and it will also speed up the next feature, which is running it on the cluster on hundreds of samples, so I think it is worth to merge these improvements.

Details

Detailed 4-way results follow

Illumina filtered:

image

Illumina unfiltered:

image

Nanopore filtered:

image

Nanopore unfiltered:

leoisl commented 1 year ago

Overview

Going further with another optimisation (from https://github.com/leoisl/pandora/commit/72cd6a0749b3b232634e05d21c34c9cf4014d875 to https://github.com/leoisl/pandora/commit/1c53eb78a70be75dd35601701e3952d75549daa9), where we sort minimiser hits not by their location in the PRG string (which is a quite heavy object) but by their kmer node id, which corresponds to the order of the node in the minimizer DAG. Algorithm-wise, when we map minimizers from reads to PRGs, we need to sort the hits. This specific sort (using the location in the PRG string) just plays a role in a specific case, when we map a read minimizer to a graph that has such minimizer duplicated in two or more places. In this case, we would sort these hits further by their location in the PRG string, but now has been changed to be sorted by the order of the minimizer in the DAG. These two sorts are actually somewhat related, as minimizers that happen earlier in the PRG string have lower id in the minimizer DAG. My personal opinion is that it should not change much the results, and the following 4-way results confirm this. RAM improvement is good, 2.4x less RAM than the previous optimisation (b19d26), allowing us to run roundhound with <10GB. RAM improvements will be detailed in a future post, I am still gathering benchmarks.

Details

Detailed 4-way results follow, only for filtered data. Here we compare the latest release (0.10.0-alpha.0), the version described in the previous post (b19d26, with lazy loading and read data optimisation), and this version under study (1c53eb, which adds an index optimisation):

Illumina data

The most improved version, 1c53eb, actually slightly improves precision for the illumina results, both with and without denovo.

image

Nanopore data

The curves for both improved versions, b19d26 and 1c53eb basically overlap, which means that the index optimisation done in 1c53eb do not introduce any bugs:

image

leoisl commented 1 year ago

The previous post shows that the improvements we've done do not introduce bugs to pandora and we can thus merge. The merge will consist of the 5 PRs (the 1st one is large, the other are small increments):

  1. Lazy loading and random multimapping: adds the lazy loading feature and randomly maps reads when their best mapping are to two or more genes;
  2. No coverage filtering: removes the hard-coded coverage filtering in pandora;
  3. Read info optimisation: do not keep heavy mapping info for each mapped read. Process this info and release the memory as soon as possible;
  4. Index optimisation: remove where each minimizer appear in each PRG string from the index. Sort duplicated minimizer matches using the node id in the minimizer DAG;
  5. Miscellaneous: small changes (code cleaning, refactoring, formatting, etc) and prepare code for next release;
leoisl commented 1 year ago

I've also removed RAM values from the previous posts, and I am gathering benchmarking data on how all these improvements reduced RAM. I will update this issue as soon as I get all benchmarking data.

leoisl commented 1 year ago

RAM and runtime improvements

History of RAM and runtime improvements for the new version of pandora that will be merged in the next PRs. These benchmarks were done running pandora compare with the RH plasmid DB (~1M PRGs) and the ESBL sample SRR16977031:

  1. v0.10.0-alpha.0 (baseline, current release) RAM usage: 178.1 GB Runtime: 130 minutes

  2. commit a76df4 (only lazy loading added - this is the version we've been using in RH, unreleased): RAM usage: 124.5 GB (30% less RAM than baseline) Runtime: 31.8 minutes (4 times faster than baseline)

  3. commit b19d26 (lazy loading + read info optimisation, unreleased): RAM usage: 22.1 GB (88% less RAM than baseline) Runtime: 13 minutes (10 times faster than baseline)

  4. commit 1c53eb (lazy loading + read info optimisation + index optimisation, unreleased): RAM usage: 9.1 GB (95% less RAM than baseline) Runtime: 8.35 minutes (15.5 times faster than baseline)

Thus when finishing all merges, we will have a version that requires 95% less RAM than current release (~20x improvement on RAM usage) and runs 15.5 times faster than current release.

Details

LSF logs follow:

Pandora benchmarking:

1c53eb (lazy loading + reads optimisation + paths optimisation):
Resource usage summary:
    CPU time :                                   4764.10 sec.
    Max Memory :                                 9345 MB
    Average Memory :                             7826.94 MB
    Total Requested Memory :                     80000.00 MB
    Delta Memory :                               70655.00 MB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                20
    Run time :                                   501 sec.
    Turnaround time :                            511 sec.

b19d26 (lazy loading + reads optimisation):
Resource usage summary:
    CPU time :                                   4771.61 sec.
    Max Memory :                                 22644 MB
    Average Memory :                             19645.91 MB
    Total Requested Memory :                     80000.00 MB
    Delta Memory :                               57356.00 MB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                20
    Run time :                                   781 sec.
    Turnaround time :                            854 sec.

a76df4 (only lazy loading - version we've been using in RH):
Resource usage summary:
    CPU time :                                   13056.05 sec.
    Max Memory :                                 127450 MB
    Average Memory :                             97789.90 MB
    Total Requested Memory :                     150000.00 MB
    Delta Memory :                               22550.00 MB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                20
    Run time :                                   1909 sec.
    Turnaround time :                            1911 sec.

v0.10.0-alpha.0 (baseline):
Resource usage summary:
    CPU time :                                   26317.66 sec.
    Max Memory :                                 182410 MB
    Average Memory :                             84832.14 MB
    Total Requested Memory :                     1024000.00 MB
    Delta Memory :                               841590.00 MB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                20
    Run time :                                   7773 sec.
    Turnaround time :                            7782 sec.
iqbal-lab commented 1 year ago

bloody hell @leoisl

iqbal-lab commented 1 year ago

for future readers, RH=roundhound.

mbhall88 commented 1 year ago

FAR OUT 🔥

rmcolq commented 1 year ago

These results are unbelievable!! Amazing

leoisl commented 1 year ago

Closed via #331, #337, #342 and #345