illinois-ceesd / mirgecom

MIRGE-Com is the workhorse simulation application for the Center for Exascale-Enabled Scramjet Design at the University of Illinois.
Other
11 stars 19 forks source link

Unify caches #1042

Open matthiasdiener opened 1 month ago

matthiasdiener commented 1 month ago

Questions for the review:

matthiasdiener commented 1 month ago

This sometimes fails with (via @MTCam):

  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/loopy/loopy/schedule/__init__.py", line 2427, in get_one_linearized_kernel
    schedule_cache.store_if_not_present(sched_cache_key, result)
  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/miniforge3/envs/nozzle.lazy.timing.env/lib/python3.11/site-packages/pytools/persistent_dict.py", line 568, in store_if_not_present
    self.store(key, value, _skip_if_present=True, _stacklevel=1 + _stacklevel)
  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/miniforge3/envs/nozzle.lazy.timing.env/lib/python3.11/site-packages/pytools/persistent_dict.py", line 725, in store
    LockManager(cleanup_m, self._lock_file(hexdigest_key),
  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/miniforge3/envs/nozzle.lazy.timing.env/lib/python3.11/site-packages/pytools/persistent_dict.py", line 150, in __init__
    raise RuntimeError("waited more than one minute "
RuntimeError: waited more than one minute on the lock file '/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/timing-run-caches/xdg-cache/pytools/pdict-v4-loopy-schedule-cache-v4-2024.1-islpy2023.2.5-cgen2020.1-cc795034cc40d49deed34705ca594c96f407a70f-v1-py3.11.8.final.0/ce4930838a82de39199da84f1d4aff9b216cfa3993e844defbaec7a6563dedae.lock' -- something is wrong
matthiasdiener commented 4 weeks ago

Nightly tests on Lassen seem to run fine with this PR.