Closed gordonwatts closed 6 months ago
After changes I would expect the numbers reported in the dashboard to be something like the number of files times the average amount of steps taken in the file (something like a factor 2 or so in the materialize_branches
notebook I believe). I saw a very drastic reduction of almost two orders of magnitude. Examples for the graphs before and after are in https://github.com/iris-hep/idap-200gbps/pull/7, which shows the CMS version but that behaves the same. How does .visualize(optimize_graph=True)
on the graph look like in this case here?
Ok - here is before @alexander-held optimization trick:
And after:
So, this is doing what we expect. The reason I was fooled was because I was doing len(total_count.dask)
and that:
Before Optimization:
0003.5013 - INFO - Number of tasks in the dask graph: 172
After Optimization:
0003.5437 - INFO - Number of tasks in the dask graph: 118
So that looks like it should be 15, not 118. In short - I do not understand what len(total_count.dask)
is doing.
See issue #65 for follow up for the counting number (optimized graph vs non-optimized?).
count_nonzero
one per axis to reduce # of tasks.On a 2 file run this goes from 2230 tasks to 2228. Hmmm....
Fixes #48