undark-lab / swyft

A system for scientific simulation-based inference at scale.
Other
154 stars 13 forks source link

Dask - Long running time and exceeding memory budget #97

Closed GertKl closed 2 years ago

GertKl commented 2 years ago

(I wrote the following in a recent email to Meiert Grootes)

I have recently been trying out the multiprocessing functionality (using Dask) in SWYFT, based on the example notebook (Example 6) that was recently uploaded to the Undark-lab github. I have been running into several issues, not being familiar with Dask to begin with. To make it simple, maybe you could clarify these two points for me?

  1. When I run the example notebook (Example 6), I find that it runs significantly slower than when I run the same notebook without multiprocessing (basically replacing the swyft.DaskSimulator object with a normal swyft.Simulator). Is this to be expected as the result of significant overhead, or should I expect to see an improvement in simulation time, even for this toy model? Just checking to see if I am somehow doing something wrong at this stage already...

  2. When trying the DaskSimulator on my own code I run into some cryptic errors. So far, I have defined my model in-memory, meaning I have not implemented it as an external simulator as in the example notebook. Could you clarify for me whether the Dask framework in SWYFT is meant to work with internal simulators as well (as opposed to requiring the model to be implemented in an external simulator)? I generally run my code in jupyter notebooks, but running it as a script at some point I got some errors related to the missing initialization statement “if name = ‘main’ ”. Is this somehow circumvented by including the initialization statement in the model definition file, as has been done in the example notebook?

I hope I have phrased my questions somewhat understandably.

For completeness, below is an excerpt of the (recurrent) errors that I am currently getting in my own code (during simulation). There is some documentation on this issue, but for now I am not able to make much sense of it, given my lack of experience with Dask and similar tools.

image

GertKl commented 2 years ago

Some more information and comments:

Thank you for your help!

bkmi commented 2 years ago

Hi there, sorry for the slow reply but we were away for the holiday.

I've contacted the eScience center about this. I am also having significant slowdowns when trying to run example 6.

I believe the eScience center wrote the notebook in the context of google colab, I wonder if there are optimizations to be made which colab has implemented? I'm not an expert on this topic so I hope to hear from them soon.

fnattino commented 2 years ago

Hi @GertKl and @bkmi, thanks a lot for reporting the issue and very sorry for the late reply.

I had a look at the notebook and at the issues that you report, and I think there are multiple things going on here. In particular, in the notebook "Example 6", the poor performance of the DaskSimulator that you both have experienced is partly due to an actual issue, for which I have just put forward a fix (#99), and partly due to the missing batch_size argument in the call to store.simulate(). This argument sets the size of the simulation batches dispatched to the workers. By default, the batch size is set to the total number of simulations, for which case one expects similar performances for the DaskSimulator and the "serial" Simulator. However, for a Dask cluster with 4 workers and ~120 simulations to run, one could set batch_size=30 and expect quite some performance gain. On my laptop, I obtain the following timings (using SWYFT that implements #99):

@bkmi: I will update the notebook with a custom batch_size value!

Concerning your other points @GertKl: the DaskSimulator should also work with models provided as regular Python functions (so without having to write your model as an external script), which might actually significant reduce overhead times. In order to increase the workers' memory limit in a Dask cluster you can use the memory_limit argument. For instance, to increase the maximum memory allocation of each worker to 5 GB:

cluster = LocalCluster(n_workers=4, threads_per_worker=1, memory_limit="5GB")

Let me/us know whether this helps and if you have any further issues/questions so that we can have a deeper look!

GertKl commented 2 years ago

Thank you for your replies! I will have a look at this as soon as I can.

bkmi commented 2 years ago

99 was merged. Please let us know if it continues to be a problem.