karel-brinda / Phylign

Alignment against all pre-2019 bacteria on laptops within a few hours (former MOF-Search)
http://brinda.eu/mof
Other
25 stars 5 forks source link

Limits of Snakemake+COBS parallelism #172

Closed karel-brinda closed 1 year ago

karel-brinda commented 2 years ago

When

it seems to me that too few cobs instances are being run in parallel, likely due to the suboptimal planning.

When this is being happen, it would be great if Snakemake could ideally somehow run new jobs with a higher number of cobs threads.

@leoisl Do you know if this is somehow possible? I.e., to increase the number of threads based on the number of currently run threads?

image
leoisl commented 2 years ago

I am unsure if this is possible, but I don't quite understand well the situation. Could you post your config? If you have a dataset with a long cobs matching (plasmids) and with --load-complete, I'd say that having 1 or 2 COBS jobs is enough as long as they are using several threads. Once the whole index is loaded into RAM, there is no disk access to be done for the index anymore, so multithreading can speed up a lot

karel-brinda commented 2 years ago

Now only two jobs are being run:

image
karel-brinda commented 2 years ago

Here's the config (diff against the default):

diff --git a/config.yaml b/config.yaml
index 7a5fa74..9301295 100644
--- a/config.yaml
+++ b/config.yaml
@@ -17,7 +17,7 @@ batches: "data/batches_full.txt"
 #     E.g. for plasmid search we recommend cobs_kmer_thres: 0.33
 # Higher values mean less but more accurate matches and faster pipeline execution, but can incur into missing matches.
 #     E.g. for gene search we recommend cobs_kmer_thres: 0.7
-cobs_kmer_thres: 0.7
+cobs_kmer_thres: 0.4

 # number of best kmer matching hits to keep for each query record (in case of tie, all equivalent hits are included too)
 nb_best_hits: 100
@@ -32,7 +32,7 @@ nb_best_hits: 100
 # asm5/asm10/asm20: asm-to-ref mapping, for ~0.1/1/5% sequence divergence
 # splice: long-read spliced alignment
 # sr: genomic short-read mapping
-minimap_preset: "sr"
+minimap_preset: "asm20"

 # other minimap2 params
 minimap_extra_params: "--eqx"
@@ -70,7 +70,7 @@ cobs_threads: 1
 # We recommend to increase cobs_threads to the max if you set this to True.
 # If you get huge slowdowns with this option, the pipeline might be suffering thrashing. Control the RAM usage with the
 # max_ram_gb parameter so that you don't get constant page swapping.
-load_complete: False
+load_complete: True

 # maximum number of I/O-heavy threads. Use this to control the amount of filesystem I/O to not overflow the filesystem.
 # in more details, this parameter controls how many I/O-heavy threads can run simultaneously.
karel-brinda commented 2 years ago

as long as they are using several threads.

Yes, exactly! And this is the issue – it needs to be specified manually and the optimal strategy depends on the type of queries. But this could be in principle automated – if not enough CPUs are not being used due to mem constrains, it would increase the number of threads for new jobs.

leoisl commented 2 years ago

For this first fig:

image

it seems things are working as expected. I can see 4 runs going on, including the 4th one which is still on the decompression stage:

cobs/streptococcus_pneumoniae__13.cobs_classic.xz  2853127266
cobs/streptococcus_pyogenes__04.cobs_classic.xz  2869253754
cobs/campylobacter_coli__02.cobs_classic.xz  2869793207
cobs/dustbin__04.cobs_classic.xz  2939833436 

Translating to GiB, easier to view:

cobs/streptococcus_pneumoniae__13.cobs_classic.xz  2.65
cobs/streptococcus_pyogenes__04.cobs_classic.xz  2.67
cobs/campylobacter_coli__02.cobs_classic.xz  2.67
cobs/dustbin__04.cobs_classic.xz  2.73

This amounts to 10.72 GiB of RAM, with a limit of 12 (default on config.yaml), we get only 1.28 GiB of spare RAM. It seems reasonable that no other COBS job was scheduled...

leoisl commented 2 years ago

Yes, exactly! And this is the issue – it needs to be specified manually and the optimal strategy depends on the type of queries. But this could be in principle automated – if not enough CPUs are not being used due to mem constrains, it would increase the number of threads for new jobs.

Ohh now I understand what you mean...

karel-brinda commented 2 years ago

Yes, RAM allocation optimal, CPU allocation strongly suboptimal :)

leoisl commented 2 years ago

Yeah, that is tricky... What we could do is to try to make number of threads vary with the size of the index, i.e. if an index is 10 GiB, then it will consume a good amount of the available RAM, it should also consume a good amount of available threads. I am unsure if this is doable with snakemake, but the idea would be like to change this in the rule:

threads: int(proportion_of_max_ram_this_job_takes * workflow.cores)

So if a job consumes 100% of the RAM, it will also consume 100% of the available cores. If it consumes 10% of the RAM, will consume 10% of the available cores.

Do you think this could be a feasible solution?

karel-brinda commented 1 year ago

Closing this for now as the most of the problems seem to have been addressed by the recent updates.