Open hendrikmakait opened 1 week ago
@hendrikmakait. Thank you, I do confirm, that scheduling time (time between I hit .compute()
and tasks start to stream) is significantly smaller after updating dask-expr
from 1.1.11 to 1.1.13. So for a simple pipeline with 5k partitions it went from 156s to 12s, and for 40k partitions it went from >3 hours to 12 minutes.
That's great to hear!
Please let us know if you notice any other performance bottlenecks. 12 minutes may still be somewhat long, depending on the number of total tasks.
@hendrikmakait 12 minutes is significant amount of time, the "run time" was ~90 minutes on 60 3-thread workers, you can find more details in this notebook https://github.com/lincc-frameworks/notebooks_lf/blob/main/ztf_periodogram/SIMPLIFIED-ztf_periodogram_PSC.ipynb
It is also worth to mention that the manager node used ~90GB of memory. The task graph had ~0.5M tasks.
When I run a more complex notebook, "lazy cells" (where I plan dask computations before .compute()
) took a dozen minutes. It is quite long and we do expect our users to implement an order of magnitude more complicated analysis.
Is there a public dataset that one could access to reproduce this? If not, I recommend profiling your code with py-spy. We could do a sanity check if you can provide a profile in the speedscope
format. Also, 90 GiB of memory on the scheduler seems excessive, in particular with "only" 500k tasks.
@hendrikmakait The data is public, I made a notebook which uses the HTTPS data storage. The dataset volume is pretty large, but graph building part doesn't really touch any data files, so the manager node issues are reproducible without fetching any actual data files. The NB: https://github.com/lincc-frameworks/notebooks_lf/blob/main/ztf_periodogram/ztf-periodogram-data.lsdb.io.ipynb
It took 90GB of RAM and 7 minutes to schedule the graph. (Do I use the right terminology here? What I mean is the time between I hit .compute()
and tasks start to stream).
I haven't tried py-spy
yet, thanks for the recommendation!
@hendrikmakait I ran the code (up to the len(x.dask)
point) and profiled it with py-spy
and memray
. I don't know much about Dask internals, so it's hard to tell what exactly is happening there, but I believe memray
indicates that the graph size is actually ~80GB.
Here you can find the code and outputs for both profilers: https://github.com/lincc-frameworks/notebooks_lf/tree/main/ztf_periodogram/profile-dask-graph
I'd really appreciate it if you could take a look. I believe you would have much better insight into this!
Update 2024.09.12 I tried to turn off optimizations with dask.config.set(array_optimize=None, dataframe_optimize=None, delayed_optimize=None)
, but it didn't change anything.
@hendrikmakait just a gentle reminder, could you please give a look to these profiler logs? Should I convert it to a Dask/dask-expr
issue?
Thanks for creating the profiles! I've had a brief look at them and there's nothing that would point toward an obvious issue at first glance. I'm not sure if I'll have any time soon to dig deeper into this. It looks like I'd have to understand the graph structures you create in much more detail to point out issues.
During today's Dask monthly meeting, people mentioned exceedingly long graph construction times related to delayed objects. I see that you are using
from_delayed
here. Please note thatdask-expr>=1.1.13
includes a fix that improves graph construction time for large collections of delayed objects indd.from_delayed
by orders of magnitude (https://github.com/dask/dask-expr/pull/1132).I don't know your specific problem, so I'm not sure if this will fix it, but I wanted to point it out nonetheless in case it helps.