riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

4caf842 breaks backward compatibility with old workflows #163

Closed lmoureaux closed 1 year ago

lmoureaux commented 1 year ago

Bug description

I have many existing trainings lying around that were produced with older versions of Law. Recently Law started crashing when checking them, with the following error:

ERROR: luigi-interface - [pid 36292] Worker Worker(salt=7733293301, workers=20, host=max-display008.desy.de, username=mourelou, pid=33179) failed    TrainShiftedClassifierAllKLFolds(effective_workflow=slurm, branch=-1, downsample=100.0, sr_low=2.725, sr_high=3.331, signal=YtoHH_Htott_Y3000_H400_TuneCP5_13TeV-madgra
ph-pythia8_TIMBER.h5, lumi=137.59, xsec=34.0, systematic=JES_down_0, efficiency=0.01, workflow=slurm)
Traceback (most recent call last):
  File "/afs/desy.de/user/r/riegerma/public/law_sw/luigi_3/luigi/worker.py", line 203, in run
    new_deps = self._run_get_new_deps()
  File "/afs/desy.de/user/r/riegerma/public/law_sw/luigi_3/luigi/worker.py", line 138, in _run_get_new_deps
    task_gen = self.task.run()
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/workflow/remote.py", line 626, in run
    self.poll()
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/workflow/remote.py", line 1163, in poll
    self._print_status_errors({
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/workflow/remote.py", line 424, in _print_status_errors
    status_pairs = self._status_error_pairs(job_num, data)
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/workflow/remote.py", line 408, in _status_error_pairs
    ("log", job_data["extra"].get("log_file", no_value)),
KeyError: 'extra'

Commit 4caf842 seems to be the culprit

riga commented 1 year ago

Thanks for opening the issue!

I added a backwards compatible lookup in d067e02. Can you check if that's the only place that requires a change? (it's updated in the lxplus/naf public dirs)

lmoureaux commented 1 year ago

Hass been running smoothly for a few days :+1: