Closed tiborsimko closed 3 years ago
FYI, the example does not work:
$ reana-client logs -w hello-yad-hpc ... 2021-02-18 14:21:17,881 | reana-workflow-engine-yadage | MainThread | INFO | Finalizing the progress tracking for: <yadage.wflow.YadageWorkflow object at 0x7f88e5bce130> 2021-02-18 14:21:17,886 | yadage.steering_api | MainThread | INFO | done. dumping workflow to disk. 2021-02-18 14:21:17,889 | reana-workflow-engine-yadage | MainThread | ERROR | Workflow failed: workflow finished but failed Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/reana_workflow_engine_yadage/cli.py", line 156, in run_yadage_workflow ys.adage_argument( File "/usr/local/lib/python3.8/contextlib.py", line 120, in __exit__ next(self.gen) File "/usr/local/lib/python3.8/site-packages/yadage/steering_api.py", line 110, in steering_ctx execute_steering( File "/usr/local/lib/python3.8/site-packages/yadage/steering_api.py", line 60, in execute_steering ys.run_adage(backend) File "/usr/local/lib/python3.8/site-packages/yadage/steering_object.py", line 100, in run_adage adage.rundag(controller=self.controller, **self.adage_kwargs) File "/usr/local/lib/python3.8/site-packages/adage/__init__.py", line 137, in rundag run_polling_workflow(controller, coroutine, update_interval, trackerlist, maxsteps) File "/usr/local/lib/python3.8/site-packages/adage/__init__.py", line 51, in run_polling_workflow for stepnum, controller in enumerate(coroutine): File "/usr/local/lib/python3.8/site-packages/adage/pollingexec.py", line 89, in adage_coroutine raise RuntimeError('workflow finished but failed') RuntimeError: workflow finished but failed 2021-02-18 14:21:17,890 | root | MainThread | ERROR | Error while publishing channel disconnected .... ==> Job logs ==> Step: helloworld ... ==> Status: failed ==> Logs: Auks API request failed : krb5 cred : unable to read credential cache INFO: Converting OCI blobs to SIF format srun: error: hpc009: task 0: Exited with exit code 255 srun: Terminating job step 951650.0 FATAL: Unable to handle docker://python:2.7-slim uri: while building SIF from layers: unable to create new build: while searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH
This is similar to (but different from) the RooFit example troubles, see https://github.com/reanahub/reana-demo-root6-roofit/pull/44, indicating r-w-e-yadage issues with Slurm integration.
Let's check our singularity wrapper in reana-job-controller component. The current version on the gate is 3.7.1-1.el7.
reana-job-controller
3.7.1-1.el7
FYI, the example does not work:
This is similar to (but different from) the RooFit example troubles, see https://github.com/reanahub/reana-demo-root6-roofit/pull/44, indicating r-w-e-yadage issues with Slurm integration.