reanahub / reana-workflow-engine-serial

REANA Workflow Engine Serial
http://reana-workflow-engine-serial.readthedocs.org
MIT License
0 stars 33 forks source link

performance: optimise job progress tracking: #151

Open tiborsimko opened 3 years ago

tiborsimko commented 3 years ago

Similarly to Yadage job progress tracking https://github.com/reanahub/reana-workflow-engine-yadage/issues/204, the Serial engine seems to be doing an unnecessary job.

RooFit Serial example:

$ kubectl logs reana-run-batch-0e439059-6141-4918-8c97-e571fee45678-wt2lh workflow-engine | wc -l
3
$ kubectl logs reana-run-batch-0e439059-6141-4918-8c97-e571fee45678-wt2lh job-controller | wc -l
390
$ kubectl logs reana-run-batch-0e439059-6141-4918-8c97-e571fee45678-wt2lh job-controller | grep werkz
2021-10-06 08:39:40,276 | werkzeug | MainThread | INFO |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2021-10-06 08:39:49,420 | werkzeug | Thread-1 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:49] "GET /jobs HTTP/1.1" 200 -
2021-10-06 08:39:49,770 | werkzeug | Thread-2 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:49] "POST /jobs HTTP/1.1" 201 -
2021-10-06 08:39:49,774 | werkzeug | Thread-3 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:49] "GET /jobs/05fe18c4-924b-4506-88c1-176a2571968e HTTP/1.1" 200 -
2021-10-06 08:39:49,780 | werkzeug | Thread-4 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:49] "GET /jobs/05fe18c4-924b-4506-88c1-176a2571968e HTTP/1.1" 200 -
2021-10-06 08:39:52,788 | werkzeug | Thread-5 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:52] "GET /jobs/05fe18c4-924b-4506-88c1-176a2571968e HTTP/1.1" 200 -
2021-10-06 08:39:55,859 | werkzeug | Thread-6 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:55] "POST /jobs HTTP/1.1" 201 -
2021-10-06 08:39:55,862 | werkzeug | Thread-7 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:55] "GET /jobs/2a71e77b-23af-40fa-8f56-0fa810bd52d8 HTTP/1.1" 200 -
2021-10-06 08:39:55,865 | werkzeug | Thread-8 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:55] "GET /jobs/2a71e77b-23af-40fa-8f56-0fa810bd52d8 HTTP/1.1" 200 -
2021-10-06 08:39:58,872 | werkzeug | Thread-9 | INFO | 127.0.0.1 - - [06/Oct/2021 08:39:58] "GET /jobs/2a71e77b-23af-40fa-8f56-0fa810bd52d8 HTTP/1.1" 200 -

Each job's status is queried three times, and even twice in a given second.

Looking at job execution times:

$ reana-client logs -w 0e439059-6141-4918-8c97-e571fee45678 | grep 2021-10
==> Started: 2021-10-06T08:39:49
==> Finished: 2021-10-06T08:39:55
==> Started: 2021-10-06T08:39:55
==> Finished: 2021-10-06T08:40:01

It seems that first "double-calls" shouldn't be fully necessary.

We may want to optimise down the number of queries done from the workflow engine container to the job controller container.