iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

Tracking logs of workers and errors found therein #57

Open alexander-held opened 4 months ago

alexander-held commented 4 months ago

We can forward all worker output with

client.register_plugin(distributed.diagnostics.plugin.ForwardOutput())

which gets very noisy and probably is really bad for performance but may help debugging. To use this it is best to not use notebooks but .py and pipe them to some file.

Here is an example output that crashes during pre-processing (with autoscaling) and with all uproot reporting disabled: log_minimal_preprocess.txt

One thing seen in there is

  File "/venv/lib/python3.9/site-packages/dask/utils.py", line 773, in __call__
    return meth(arg, *args, **kwargs)
  File "/venv/lib/python3.9/site-packages/dask/sizeof.py", line 59, in sizeof_python_collection
    return sys.getsizeof(seq) + sum(map(sizeof, seq))
RecursionError: maximum recursion depth exceeded
2024-04-23 21:09:05,729 - distributed.sizeof - WARNING - Sizeof calculation for object of type 'dict' failed. Defaulting to -1 B

The closest thing to that I find is https://github.com/dask/distributed/issues/8378 but it is not clear to me that this is related.