ssl-hep / ServiceX_Uproot_Transformer

Transformer Image for Uproot-Based Transforms
BSD 3-Clause "New" or "Revised" License
1 stars 4 forks source link

Uproot transformer image (`sslhep/servicex_func_adl_uproot_transformer:develop` pushed on Oct12) crashes for some ROOT ntuple #22

Closed kyungeonchoi closed 2 years ago

kyungeonchoi commented 2 years ago

Story

Current uproot transformer image (sslhep/servicex_func_adl_uproot_transformer:develop pushed on Oct12) crashes for some ROOT ntuple file. It doesn't always crash but mostly crashes.

Reproduce

query = "(Select (call EventDataset 'ServiceXDatasetSource' 'nominal_Loose') (lambda (list event) (dict (list 'mu_pt' 'el_pt' 'weight_jvt') (list (attr event 'mu_pt') (attr event 'el_pt') (attr event 'weight_jvt')))))"

sx_ds = ServiceXDataset(dataset="user.mgeyik:user.mgeyik.mc16_13TeV.346343.PhPy8EG_ttH125_0l.SGTOP1.e7148_s3126_r10201_p4346.ll.d_out.root", \
                        backend_name="uproot_river", \
                        image="sslhep/servicex_func_adl_uproot_transformer:develop")
sx_ds.get_data_pandas_df(query)

Error messages

Error message from the frontend:

WARNING -   -> error: Failed to transform input file root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/user/mgeyik/b9/52/user.mgeyik.26576946._000004.out.root: I/O operation on closed file
More relevant error found in kibana: ``` 17:41:40.678 ------< {'file-path': 'root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/user/mgeyik/0c/56/user.mgeyik.26576946._000013.out.root', 'file-id': 260598, 'status': 'success', 'num-messages': 0, 'total-time': 1.5, 'total-events': 0, 'total-bytes': 0, 'avg-rate': 0} 17:41:40.681 generated_transformer.py: 1.3 sec 17:41:40.684 Traceback (most recent call last): 17:41:40.684 File "/opt/conda/lib/python3.7/site-packages/XRootD/client/utils.py", line 39, in __call__ 17:41:40.686 Traceback (most recent call last): 17:41:40.686 File "/opt/conda/lib/python3.7/site-packages/XRootD/client/utils.py", line 39, in __call__ 17:41:40.686 awkward Array -> Arrow: 0.01 sec 17:41:40.687 self.callback(self.status, self.response, self.hostlist) 17:41:40.687 File "/opt/conda/lib/python3.7/site-packages/uproot/source/xrootd.py", line 234, in callback 17:41:40.687 self.callback(self.status, self.response, self.hostlist) 17:41:40.687 File "/opt/conda/lib/python3.7/site-packages/uproot/source/xrootd.py", line 234, in callback 17:41:40.687 self._xrd_error(status) 17:41:40.688 File "/opt/conda/lib/python3.7/site-packages/uproot/source/xrootd.py", line 114, in _xrd_error 17:41:40.688 self._xrd_error(status) 17:41:40.688 File "/opt/conda/lib/python3.7/site-packages/uproot/source/xrootd.py", line 114, in _xrd_error 17:41:40.690 raise uproot._util._file_not_found(self._file_path, status.message) 17:41:40.690 raise uproot._util._file_not_found(self._file_path, status.message) 17:41:40.690 FileNotFoundErrorFileNotFoundError: : file not found ([ERROR] Server responded with an error: [3001] Request contains no vector 17:41:40.690 ) 17:41:40.690 17:41:40.690 'root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/user/mgeyik/87/b4/user.mgeyik.26576946._000018.out.root' 17:41:40.690 17:41:40.690 Files may be specified as: 17:41:40.690 * str/bytes: relative or absolute filesystem path or URL, without any colons 17:41:40.690 other than Windows drive letter or URL schema. 17:41:40.690 Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root" 17:41:40.690 * str/bytes: same with an object-within-ROOT path, separated by a colon. 17:41:40.690 Example: "rel/file.root:tdirectory/ttree" 17:41:40.690 * pathlib.Path: always interpreted as a filesystem path or URL only (no 17:41:40.690 object-within-ROOT path), regardless of whether there are any colons. 17:41:40.690 Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root") 17:41:40.690 17:41:40.690 Functions that accept many files (uproot.iterate, etc.) also allow: 17:41:40.690 * glob syntax in str/bytes and pathlib.Path. 17:41:40.690 Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree" 17:41:40.690 * dict: keys are filesystem paths, values are objects-within-ROOT paths. 17:41:40.690 Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"} 17:41:40.690 * already-open TTree objects. 17:41:40.690 * iterables of the above. 17:41:40.690 file not found ([ERROR] Server responded with an error: [3001] Request contains no vector 17:41:40.690 ) 17:41:40.690 17:41:40.690 'root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/user/mgeyik/87/b4/user.mgeyik.26576946._000018.out.root' 17:41:40.690 17:41:40.690 Files may be specified as: 17:41:40.690 * str/bytes: relative or absolute filesystem path or URL, without any colons 17:41:40.690 other than Windows drive letter or URL schema. 17:41:40.690 Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root" 17:41:40.690 * str/bytes: same with an object-within-ROOT path, separated by a colon. 17:41:40.690 Example: "rel/file.root:tdirectory/ttree" 17:41:40.690 * pathlib.Path: always interpreted as a filesystem path or URL only (no 17:41:40.690 object-within-ROOT path), regardless of whether there are any colons. 17:41:40.690 Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root") 17:41:40.690 17:41:40.690 Functions that accept many files (uproot.iterate, etc.) also allow: 17:41:40.690 * glob syntax in str/bytes and pathlib.Path. 17:41:40.690 Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree" 17:41:40.690 * dict: keys are filesystem paths, values are objects-within-ROOT paths. 17:41:40.690 Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"} 17:41:40.690 * already-open TTree objects. 17:41:40.690 * iterables of the above. 17:41:40.690 17:41:40.690 17:41:40.690 Traceback (most recent call last): 17:41:40.690 File "/opt/conda/lib/python3.7/site-packages/XRootD/client/utils.py", line 39, in __call__ 17:41:40.691 self.callback(self.status, self.response, self.hostlist) 17:41:40.691 File "/opt/conda/lib/python3.7/site-packages/uproot/source/xrootd.py", line 234, in callback 17:41:40.691 self._xrd_error(status) 17:41:40.691 File "/opt/conda/lib/python3.7/site-packages/uproot/source/xrootd.py", line 114, in _xrd_error 17:41:40.691 raise uproot._util._file_not_found(self._file_path, status.message) 17:41:40.691 FileNotFoundError: file not found ([ERROR] Server responded with an error: [3001] Request contains no vector 17:41:40.691 ) 17:41:40.691 17:41:40.691 'root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/user/mgeyik/87/b4/user.mgeyik.26576946._000018.out.root' 17:41:40.691 17:41:40.691 Files may be specified as: 17:41:40.691 * str/bytes: relative or absolute filesystem path or URL, without any colons 17:41:40.691 other than Windows drive letter or URL schema. 17:41:40.691 Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root" 17:41:40.691 * str/bytes: same with an object-within-ROOT path, separated by a colon. 17:41:40.691 Example: "rel/file.root:tdirectory/ttree" 17:41:40.691 * pathlib.Path: always interpreted as a filesystem path or URL only (no 17:41:40.691 object-within-ROOT path), regardless of whether there are any colons. 17:41:40.691 Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root") 17:41:40.691 17:41:40.691 Functions that accept many files (uproot.iterate, etc.) also allow: 17:41:40.691 * glob syntax in str/bytes and pathlib.Path. 17:41:40.691 Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree" 17:41:40.691 * dict: keys are filesystem paths, values are objects-within-ROOT paths. 17:41:40.691 Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"} 17:41:40.691 * already-open TTree objects. 17:41:40.691 * iterables of the above. ```
oshadura commented 2 years ago

@kyungeonchoi @masonproffitt I see the same problem on coffea-casa running HZZ analysis:

Transform 82c52561-552a-47e9-90dc-a26e8d7cc80c had 1 errors:
  Error transforming file: root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/MC/mc_361106.Zee.4lep.root
  -> error: Failed to transform input file root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/MC/mc_361106.Zee.4lep.root: I/O operation on closed file

---------------------------------------------------------------------------
ServiceXFailedFileTransform               Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/servicex/servicex.py in _get_files(self, selection_query, data_type, notifier, title)
    653                 # Reflect the files back up a level.
--> 654                 async for r in stream_local_files:
    655                     yield r

/opt/conda/lib/python3.8/site-packages/servicex/servicex.py in _get_files_from_servicex(self, request_id, client, minio_adaptor, notifier)
    766 
--> 767             async for info in stream_downloaded:
    768                 yield info

/opt/conda/lib/python3.8/site-packages/servicex/servicex.py in _download_a_file(self, stream, request_id, minio_adaptor, notifier)
    735         file_object_list: List[Tuple[str, Path]] = []
--> 736         async for f in stream:
    737             copy_to_path = self._cache.data_file_location(request_id, f)

/opt/conda/lib/python3.8/site-packages/servicex/servicex.py in _get_minio_bucket_files_from_servicex(self, request_id, client, minio_adaptor, notifier)
    813             # Return the minio information.
--> 814             async for info in stream_new_object:
    815                 yield info

/opt/conda/lib/python3.8/site-packages/servicex/minio_adaptor.py in find_new_bucket_files(adaptor, request_id, update)
    180     seen = []
--> 181     async for _ in update:
    182         # Sadly, this is blocking, and so may hold things up

/opt/conda/lib/python3.8/site-packages/servicex/utils.py in stream_unique_updates_only(stream)
    208     last_p: Optional[TransformTuple] = None
--> 209     async for p in stream:
    210         if p != last_p:

/opt/conda/lib/python3.8/site-packages/servicex/servicex_adaptor.py in trap_servicex_failures(stream)
    238         if did_fail is not None and did_fail != 0:
--> 239             raise ServiceXFailedFileTransform(f'ServiceX failed to transform {did_fail} '
    240                                               f'files - data incomplete (remaining: {remain}, '

ServiceXFailedFileTransform: (ServiceXFailedFileTransform(...), 'ServiceX failed to transform 1 files - data incomplete (remaining: 0, processed: 0).')

Related to https://github.com/iris-hep/analysis-grand-challenge/issues/1

BenGalewsky commented 2 years ago

Merged PR #23 - does that fix it?

kyungeonchoi commented 2 years ago

Yes!

masonproffitt commented 2 years ago

Yeah, the issue here was two separate bugs in XRootD. Updating to 5.1.1 fixed one of them, and there's a workaround for the other by changing uproot.open.defaults["xrootd_handler"]. #23 made these changes. We should still update to xrootd>=5.2.0 soon, but that requires some more delicate dependency handling.

oshadura commented 2 years ago

@kyungeonchoi @masonproffitt is it available already in develop image?

BenGalewsky commented 2 years ago

@oshadura - It's on develop branch and I also cherry-picked these commits to patch the 1.0.0-RC.4 release