Closed gordonwatts closed 6 months ago
Running on the full 1 TB sample with the small
selection takes: 4:54.
Running on the full 1 TB sample with the all
selection takes: 11:03
Dask comput logs. First, for the all dataset:
0667.2845 - INFO - Using `uproot.dask` to open files (splitting files 2 ways).
0667.6733 - INFO - Generating the dask compute graph for 34 fields
0667.6738 - INFO - Field event_number is not a scalar field. Skipping count.
0667.6741 - INFO - Field run_number is not a scalar field. Skipping count.
0668.0005 - INFO - Number of tasks in the dask graph: optimized: 20,788 unoptimized 258,512
0668.0006 - INFO - Computing the total count
0801.3024 - INFO - Done: result = 129,913,000
And it seems to fail for the small
case:
0001.0025 - INFO - Computing the total count
Traceback (most recent call last):
File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/servicex_materialize_branches.py", line 328, in <module>
main(
File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/servicex_materialize_branches.py", line 169, in main
r = total_count.compute() # type: ignore
File "/venv/lib/python3.9/site-packages/dask/base.py", line 375, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/venv/lib/python3.9/site-packages/dask/base.py", line 661, in compute
results = schedule(dsk, keys, **kwargs)
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 1314, in __call__
(result, counters), duration = with_duration(self._call_impl)(
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 1152, in wrapper
result = f(*args, **kwargs)
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 1296, in _call_impl
return self.read_tree(
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 983, in read_tree
mapping = self.form_mapping_info.load_buffers(
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 906, in load_buffers
arrays = tree.arrays(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 823, in arrays
_ranges_or_baskets_to_arrays(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 2993, in _ranges_or_baskets_to_arrays
branchid_to_branch[cache_key]._awkward_check(interpretation)
KeyError: '31fc00d0-07ef-11ef-b2c4-bc4110acbeef:/atlas_xaod_tree;1:jet_EnergyPerSampling(6)'
If that is repeatable we'll need to follow up.
We need to have a query that does some heavy skimming of the SX query to better understand:
Do this by putting the queries in a file that can then be easily imported. This is just to create say 3 queries for xaod:
Once this is in, then we can add even more queries (from different back-ends!).