Open gordonkoehn opened 16 hours ago
First thoughts: The core loops are:
Key | Iterations | Runtime per iter | Memory | Potential Reduction |
---|---|---|---|---|
main | 1 | est. 4.6 h. | 1414 MB pd.df_tally | ??? |
location | 8 | 35min for 100 b.s. | ~186 MB (df_tally[location]) | 1/8 |
bootstraps (b.s.) | 100 min - 1000 optimum | 21 s total at 100 iters | ~190 MB (df_tally[location] resampled | dep. on available cores, say 1/5 - 1/10 |
date_intervals | (#dates - 1) = 12 | 0.5-6s | 7.2872 MB (df_tally[location][resampled][date_interval] | startup overhead 1/2 ? |
#dates | ~13 (one sample per week) | XXX | XXX |
That makes for an estimate of est. 4.6 h total runtime for 8 cities, given they are the same as Zürich. (M1 Pro Chip)
These levels seem to be independent at first sight.
_I could imagine that ll.KernelDeconv
could be parralized - perhaps easiest would be the most inner looüp for the date_intervals
_.
In general to use python's
multiprocessing
the following must be true:
See Co-Pilote:
Using the KernelDeconv class and its deconv method with multiprocessing should generally work, but there are a few considerations to keep in mind:
Thread Safety: Ensure that the objects and methods used within KernelDeconv are thread-safe. This includes the kernel, regressor, and confidence interval objects.
Data Sharing: When using multiprocessing, data is typically copied to each process. If the data is large, this can be inefficient. Consider using shared memory or other techniques to manage large datasets.
Pickleability: Objects passed to multiprocessing must be pickleable. Ensure that all objects and methods used in KernelDeconv can be serialized with pickle.
To Check:
Currently is V-Pipe running LollliPop with the Resources: threads: 1 memory: 1024 MB disk_mb 1024
See config_shema.json and rule deconvolution
There is something I don't get about this memory because for my current date range this would mean it would not run as is. No, as V-Pipe seems to run Lollipop already stratified per location.
Conclusion:
Both Multiprocessing at the level of location
and bootstraps
seems feasible and reasonable.
At the level of location_intervals
there is probably to much overhead for the short iteration duration. Multiprocessing at the level or bootstraps
would further allow to speed up single location applications.
Using bootstraps
as multiprocessing would allow us to fix the number of cores.
The memory would stay within the bounds of a normal 1 GB job, and we would split the single job to about ten jobs max in either case – a reasonable investment of resources.
Integration in Snakemake
This should be no problem, we would just need to add a flag to the rule
to allow threads=10
Check for the ease of running this in the current V-Pipe.
Prepare and submit this as a good PR.