Closed gordonwatts closed 4 months ago
Repro:
python servicex/servicex_materialize_branches.py -v --distributed-client scheduler --dask-scheduler 'tcp://dask-gwatts-f28f74d7-a.af-jupyter:8786' --dask-profile --num-files 0 --dataset data_special --ignore-cache --query xaod_medium
The problem is steps_per_file
. For running on a cluster it is set to a large number, and with tight cuts there are some files that are now too small - and I assume produce a zero-length block. And uproot
breaks when that happens.
steps_per_file
depending on what kind of environment we are running on.xaod_small
, we tend to have very little data in the filesSo this code adjust steps_per_file
to 1 for a tight and 2 for medium.
The following crash occurs when we run on any large-ish data set with a cut: