reanahub / reana-workflow-engine-serial

REANA Workflow Engine Serial
http://reana-workflow-engine-serial.readthedocs.org
MIT License
0 stars 33 forks source link

logs: report unsupported compute backends back to the user #103

Closed tiborsimko closed 4 years ago

tiborsimko commented 4 years ago

Current behaviour

When REANA is not compiled with HTC/HPC support, the workflow fails, but the message "Backend htcondorcern is not supported" is not propagated back to the user. It is only available in pod logs where user does not have access.

$ reana-client logs -w htc.2
==> Workflow engine logs
2020-02-21 08:34:29,616 | root | MainThread | INFO | Workflow a8cc44a1-f02e-4425-8d44-6f9555c7beac failed. Files available at /var/reana/users/00000000-0000-0000-0000-000000000000/workflows/a8cc44a1-f02e-4425-8d44-6f9555c7beac.

$ reana-client ls -w htc.2
NAME             SIZE   LAST-MODIFIED
code/fitdata.C   1648   2020-02-21T08:34:16
code/gendata.C   1937   2020-02-21T08:34:15

$ kubectl logs reana-batch-serial-a8cc44a1-f02e-4425-8d44-6f9555c7beac-fmjn4 job-controller
 * Serving Flask app "reana_job_controller/app.py"
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
/usr/local/lib/python3.6/site-packages/htcondor/__init__.py:25: UserWarning:
Using a null condor_config.
Neither the environment variable CONDOR_CONFIG, /etc/condor/,
/usr/local/etc/, nor ~/condor/ contain a condor_config source.
  warnings.warn(message)
2020-02-21 08:34:20,611 | werkzeug | MainThread | INFO |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2020-02-21 08:34:29,512 | werkzeug | Thread-1 | INFO | 127.0.0.1 - - [21/Feb/2020 08:34:29] "GET /jobs HTTP/1.1" 200 -
2020-02-21 08:34:29,580 | root | Thread-2 | ERROR | Job submission failed. Backend htcondorcern is not supported.
NoneType: None
2020-02-21 08:34:29,582 | root | Thread-2 | INFO | Storing workflow logs: a8cc44a1-f02e-4425-8d44-6f9555c7beac
2020-02-21 08:34:29,614 | werkzeug | Thread-2 | INFO | 127.0.0.1 - - [21/Feb/2020 08:34:29] "POST /jobs HTTP/1.1" 500 -

Expected behaviour

The REANA cluster should quickly verify whether the backends wanted by the user are supported and it should report back the reason for the failure so that the user would see it in the output of logs command.