scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.
https://uproot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
224 stars 69 forks source link

Reading files through HTTPS protocol results `TypeError: a bytes-like object is required, not 'ServerDisconnectedError'` #1233

Open oshadura opened 1 week ago

oshadura commented 1 week ago

Reading files through HTTPS protocol results: TypeError: a bytes-like object is required, not 'ServerDisconnectedError'

Traceback: https://gist.github.com/oshadura/df21fcdf12c8ad9bac6b759ca164064c

>>> import uproot
>>> uproot.__version__
'5.3.7'

Reproducer:

import numpy as np
import awkward as ak
import matplotlib.pyplot as plt

import uproot

import dask
import hist
from hist.dask import Hist
from coffea.nanoevents import NanoEventsFactory
#prefix = 'root://xcache//store/user/ncsmith/samplegame/'
prefix = 'https://xrootd-local.unl.edu:1094//store/user/AGC/samplegame/'

samples = [
    uproot.dask(prefix + "sample%d.root" % i, open_files=False)
    for i in range(6)
]
h = (
    Hist.new
    .IntCat(range(6), label="Sample")
    .Reg(100, 0, 500, label="Jet $p_T$")
    .Double()
)
for i, sample in enumerate(samples):
    h.fill(i, ak.flatten(sample.Jet_pt))

fig, ax = plt.subplots()
h, *_ = dask.compute(h)
h.plot1d(ax=ax)
ax.set_yscale("log")
ax.legend(title="Sample")

@nsmith-

alexander-held commented 1 week ago

Smaller reproducer:

import uproot

with uproot.open("https://xrootd-local.unl.edu:1094//store/user/AGC/samplegame/sample1.root") as f:
    f["Events"]  # this works
    f["Events"].arrays("Jet_pt")  # this breaks
jpivarski commented 1 week ago

The small reproducer doesn't do it:

>>> import uproot
>>> with uproot.open("https://xrootd-local.unl.edu:1094//store/user/AGC/samplegame/sample1.root") as f:
...     f["Events"]  # this works
...     f["Events"].arrays("Jet_pt")  # this breaks
... 
<TTree 'Events' (431 branches) at 0x6ffffbf43290>
<Array [{Jet_pt: [57.5, ..., 15.3]}, ...] type='500000 * {Jet_pt: var * flo...'>

but the big one does. Or it's random (because it's a ServerDisconnectedError) and that's what I got on one invocation of each.

If the server disconnected, then some error has to be raised about that (unless it's a rare failure to be skipped and filed in a report). The failing line is

  File "/home/jpivarski/irishep/uproot5/src/uproot/source/chunk.py", line 388, in wait
    self._raw_data = numpy.frombuffer(self._future.result(), dtype=self._dtype)
                                      ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/uproot5/src/uproot/source/coalesce.py", line 36, in result
    return self._parent.result(timeout=timeout)[self._s]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
TypeError: 'ServerDisconnectedError' object is not subscriptable

which is to say that

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/coalesce.py#L36

isn't checking for the possibility that the return value of result might be an exception object. (Why is the exception returned, rather than raised?)

The coalesce.py submodule was added by @nsmith-, so maybe this line just needs something to handle the exception case. I'm still confused by the fact that it's returning, rather than raising, the exception, since

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/futures.py#L144-L149

puts the exception class, object, and traceback into Future._excinfo, not Future._result, and

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/futures.py#L129-L141

raises the exception, rather than returning it, here:

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/futures.py#L34-L38

Some SliceFuture._parent is not following this protocol. @nsmith-, are all possible values of SliceFuture._parent an Uproot future?

nsmith- commented 1 week ago

are all possible values of SliceFuture._parent an Uproot future?

In the case of FSSpec (the only source currently using the coalescing algorithm), none of them are. They are python futures as prepared by https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/fsspec.py#L167 where self._executor is an instance of FSSpecLoopExecutor: https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/fsspec.py#L198-L206 I don't know yet why run_coroutine_threadsafe ends up setting the content instead of an exception

oshadura commented 1 week ago

The small reproducer doesn't do it:

>>> import uproot
>>> with uproot.open("https://xrootd-local.unl.edu:1094//store/user/AGC/samplegame/sample1.root") as f:
...     f["Events"]  # this works
...     f["Events"].arrays("Jet_pt")  # this breaks
... 
<TTree 'Events' (431 branches) at 0x6ffffbf43290>
<Array [{Jet_pt: [57.5, ..., 15.3]}, ...] type='500000 * {Jet_pt: var * flo...'>

but the big one does. Or it's random (because it's a ServerDisconnectedError) and that's what I got on one invocation of each.

If the server disconnected, then some error has to be raised about that (unless it's a rare failure to be skipped and filed in a report). The failing line is

  File "/home/jpivarski/irishep/uproot5/src/uproot/source/chunk.py", line 388, in wait
    self._raw_data = numpy.frombuffer(self._future.result(), dtype=self._dtype)
                                      ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/uproot5/src/uproot/source/coalesce.py", line 36, in result
    return self._parent.result(timeout=timeout)[self._s]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
TypeError: 'ServerDisconnectedError' object is not subscriptable

which is to say that

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/coalesce.py#L36

isn't checking for the possibility that the return value of result might be an exception object. (Why is the exception returned, rather than raised?)

The coalesce.py submodule was added by @nsmith-, so maybe this line just needs something to handle the exception case. I'm still confused by the fact that it's returning, rather than raising, the exception, since

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/futures.py#L144-L149

puts the exception class, object, and traceback into Future._excinfo, not Future._result, and

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/futures.py#L129-L141

raises the exception, rather than returning it, here:

https://github.com/scikit-hep/uproot5/blob/94e5199e0a78a1667e40a994ce601e2e7ad210a2/src/uproot/source/futures.py#L34-L38

Some SliceFuture._parent is not following this protocol. @nsmith-, are all possible values of SliceFuture._parent an Uproot future?

I will ask Carl to check server side logs and post them here...

oshadura commented 1 week ago
240626 15:46:12.195505 32239 acc_Audit: unknown.898250:416@c2409.shor.hcc grant https *@[::ffff:172.30.24.9] stat /store/user/AGC/samplegame/sample1.root
240626 15:46:12.195658 2192 cms_Decode: xrootd-local redirects unknown.898250:416@c2409.shor.hcc to red-xfer5.unl.edu:1094 /store/user/AGC/samplegame/sample1.root
240626 15:46:12.195710 32239 unknown.898250:416@c2409.shor.hcc Xrootd_Protocol: rc=-256 stat /store/user/AGC/samplegame/sample1.root

Same log entry on xfer5. The only thing catching my attention is :

5:50
240626 15:46:07.113526 17535 multiuser_UserSentry: Anonymous client; no user set, cannot change FS UIDs

From Carl: maybe https needs FS UID set?