riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

Slurm workflows crash on polling #151

Closed lmoureaux closed 1 year ago

lmoureaux commented 1 year ago
ERROR: luigi-interface - [pid 125826] Worker Worker(salt=8022718941, workers=5, host=max-display005.desy.de, username=mourelou, pid=106607) failed    TrainClassifierAllKLFolds(workflow=slurm, branch=-1, downsample=100, sr_low=2.725, sr_
high=3.331, signal=XToYYprimeTo4Q_MX3000_MY170_MYprime170_narrow_TuneCP5_13TeV-madgraph-pythia8_TIMBER.h5, lumi=26.81, xsec=24.0)
Traceback (most recent call last):
  File "/afs/desy.de/user/r/riegerma/public/law_sw/luigi_3/luigi/worker.py", line 203, in run
    new_deps = self._run_get_new_deps()
  File "/afs/desy.de/user/r/riegerma/public/law_sw/luigi_3/luigi/worker.py", line 138, in _run_get_new_deps
    task_gen = self.task.run()
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/workflow/remote.py", line 562, in run
    self.poll()
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/workflow/remote.py", line 923, in poll
    query_data = self.job_manager.query_batch(job_ids, **query_kwargs)
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/job/base.py", line 411, in query_batch
    return self._apply_batch(
  File "/afs/desy.de/user/r/riegerma/public/law_sw/law/law/job/base.py", line 302, in _apply_batch
    result_data[job_obj] = data if isinstance(data, Exception) else data[job_obj]
TypeError: unhashable type: 'list'

I see this line the crashing line was changed recently, so it's most likely related: https://github.com/riga/law/blame/94d9132be9f8a46032524b8b06a33fc043748877/law/job/base.py#L302

riga commented 1 year ago

Hey @lmoureaux ! Indeed, my bad. Fixing right away.

Thanks for reporting :+1:

riga commented 1 year ago

Should be fixed by 07ba75d8. Could you give it a try? (it's already checked out public NAF afs location)

lmoureaux commented 1 year ago

Works, thanks for the quick fix!