Closed ivukotic closed 10 months ago
@martindurant (and @douglasdavis), should this be a dask-awkward issue? (I don't have permissions to transfer it.)
It says that the graph contains a lock object. That must have been introduced in the from_map call that I think uproot uses - perhaps proxying an open file? Printing out the details within the graph or translating it to a low-level-graph might help.
It says that the graph contains a lock object.
Uh oh, I missed that. (I was looking at the bottom of the stack trace.) I'll figure out what that lock is for.
I wish we had a better tool to tell you where in the object the unpicklable thing lives
There's no issue in a random ROOT file,
>>> import uproot, skhep_testdata
>>> from dask.distributed import Client
>>> client = Client()
>>>
>>> a = uproot.dask({skhep_testdata.data_path("uproot-HZZ.root"): "events"})
>>> a.compute()
<Array [{NJet: 0, Jet_Px: [], ...}, ..., {...}] type='2421 * {NJet: int32, ...'>
so it must be something special in DAOD_PHYSLITE. Yes, it is.
It's this one:
The thing that's special about this file (these branches, to be specific) is that it has incompletely written ("embedded") TBaskets, which have to be read single-threaded because reading it changes the TBranch object.
I'll adapt the __setstate__
/__getstate__
to drop and recreate the lock on pickling.
PR #1103 fixes this.
I can get a few branches read in using uproot.dask no problem. As long as I don't have a dask cluster. With a distributed dask cluster it breaks. Here the simplest reproducible example. File is already cached so no need for authentication.
Here error: