GertKl commented 2 years ago

(I wrote the following in a recent email to Meiert Grootes)

I have recently been trying out the multiprocessing functionality (using Dask) in SWYFT, based on the example notebook (Example 6) that was recently uploaded to the Undark-lab github. I have been running into several issues, not being familiar with Dask to begin with. To make it simple, maybe you could clarify these two points for me?

When I run the example notebook (Example 6), I find that it runs significantly slower than when I run the same notebook without multiprocessing (basically replacing the swyft.DaskSimulator object with a normal swyft.Simulator). Is this to be expected as the result of significant overhead, or should I expect to see an improvement in simulation time, even for this toy model? Just checking to see if I am somehow doing something wrong at this stage already...
When trying the DaskSimulator on my own code I run into some cryptic errors. So far, I have defined my model in-memory, meaning I have not implemented it as an external simulator as in the example notebook. Could you clarify for me whether the Dask framework in SWYFT is meant to work with internal simulators as well (as opposed to requiring the model to be implemented in an external simulator)? I generally run my code in jupyter notebooks, but running it as a script at some point I got some errors related to the missing initialization statement “if name = ‘main’ ”. Is this somehow circumvented by including the initialization statement in the model definition file, as has been done in the example notebook?

I hope I have phrased my questions somewhat understandably.

For completeness, below is an excerpt of the (recurrent) errors that I am currently getting in my own code (during simulation). There is some documentation on this issue, but for now I am not able to make much sense of it, given my lack of experience with Dask and similar tools.

GertKl commented 2 years ago

Some more information and comments:

The warnings from the screenshot above are recurrent, and the code keeps running while printing these warnings until keyboard interrupt. This happens when running the store.simulate() function. Meiert indicated that the full stack trace may be helpful, but I'm not sure how to find this (tips?).
From what I can tell, running the model function itself requires roundabout 1.5 GB of memory. Is any other part of the code relevant as far as memory requirements are concerned? Meiert indicated that a memory limit (of the workers?) could be addressed in the definition of (the workers of) the dask cluster. How would you do that?
I am running a jupyter notebook from a conda environment with python 3.9.7. Swyft was originally installed from github about 2-3 weeks ago, but I have since tried with the latest release, 0.2.1, with the same result.

Thank you for your help!

bkmi commented 2 years ago

Hi there, sorry for the slow reply but we were away for the holiday.

I've contacted the eScience center about this. I am also having significant slowdowns when trying to run example 6.

I believe the eScience center wrote the notebook in the context of google colab, I wonder if there are optimizations to be made which colab has implemented? I'm not an expert on this topic so I hope to hear from them soon.

fnattino commented 2 years ago

Hi @GertKl and @bkmi, thanks a lot for reporting the issue and very sorry for the late reply.

I had a look at the notebook and at the issues that you report, and I think there are multiple things going on here. In particular, in the notebook "Example 6", the poor performance of the DaskSimulator that you both have experienced is partly due to an actual issue, for which I have just put forward a fix (#99), and partly due to the missing batch_size argument in the call to store.simulate(). This argument sets the size of the simulation batches dispatched to the workers. By default, the batch size is set to the total number of simulations, for which case one expects similar performances for the DaskSimulator and the "serial" Simulator. However, for a Dask cluster with 4 workers and ~120 simulations to run, one could set batch_size=30 and expect quite some performance gain. On my laptop, I obtain the following timings (using SWYFT that implements #99):

Simulator: 27.2 s;
DaskSimulator and a LocalCluster with 4 workers, batch_size=None (default): 31 s;
Same as above, but batch_size=30: 12 s.

@bkmi: I will update the notebook with a custom batch_size value!

Concerning your other points @GertKl: the DaskSimulator should also work with models provided as regular Python functions (so without having to write your model as an external script), which might actually significant reduce overhead times. In order to increase the workers' memory limit in a Dask cluster you can use the memory_limit argument. For instance, to increase the maximum memory allocation of each worker to 5 GB:

cluster = LocalCluster(n_workers=4, threads_per_worker=1, memory_limit="5GB")

Let me/us know whether this helps and if you have any further issues/questions so that we can have a deeper look!

GertKl commented 2 years ago

Thank you for your replies! I will have a look at this as soon as I can.

bkmi commented 2 years ago

99 was merged. Please let us know if it continues to be a problem.

undark-lab / swyft

Dask - Long running time and exceeding memory budget #97

99 was merged. Please let us know if it continues to be a problem.