ssl-hep / ServiceX_frontend

Client access library for ServiceX
5 stars 11 forks source link

Cache Collisions when you ignore cache for a re-run #375

Open gordonwatts opened 2 months ago

gordonwatts commented 2 months ago

This is with the current 3.0 alpha 16.

To repro:

  1. Run a query
  2. Re-run query and tell the system to ignore the cache
  3. Re-run the query again, this time without the ignore.

You'll get a cache collision error:

0000.0845 - INFO - root - Using release 22.2.107 for type information.
0000.1185 - WARNING - func_adl.type_based_replacement - Unknown type for name len
0000.8609 - INFO - root - Building ServiceX query
0000.8610 - INFO - root - Using dataset mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r13145_p6026.
0000.8611 - INFO - root - Running on 10 files of dataset.
0000.8615 - INFO - root - Starting ServiceX query
0000.9198 - INFO - servicex.servicex_client - Returning code generators from cache

Traceback (most recent call last):
  File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/servicex_materialize_branches.py", line 356, in <module>
    main(
  File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/servicex_materialize_branches.py", line 119, in main
    files = query_servicex(
  File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/servicex_materialize_branches.py", line 92, in query_servicex
    results = sx.deliver(spec)
  File "/venv/lib/python3.9/site-packages/servicex/servicex_client.py", line 107, in deliver
    results = group.as_signed_urls()
  File "/venv/lib/python3.9/site-packages/make_it_sync/func_wrapper.py", line 63, in wrapped_call
    return _sync_version_of_function(fn, *args, **kwargs)
  File "/venv/lib/python3.9/site-packages/make_it_sync/func_wrapper.py", line 14, in _sync_version_of_function
    return loop.run_until_complete(r)
  File "/usr/AnalysisBaseExternals/25.2.2/InstallArea/x86_64-el9-gcc13-opt/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/venv/lib/python3.9/site-packages/servicex/dataset_group.py", line 76, in as_signed_urls_async
    return await asyncio.gather(*self.tasks)
  File "/venv/lib/python3.9/site-packages/servicex/query.py", line 521, in as_signed_urls_async
    return await self.submit_and_download(
  File "/venv/lib/python3.9/site-packages/servicex/query.py", line 210, in submit_and_download
    self.cache.get_transform_by_hash(sx_request.compute_hash())
  File "/venv/lib/python3.9/site-packages/servicex/query_cache.py", line 84, in get_transform_by_hash
    raise CacheException("Multiple records found in db for hash")
servicex.query_cache.CacheException: Multiple records found in db for hash
gordonwatts commented 2 months ago

Indeed - if you look in db.json, you'll find that there are two entries with the same hash.