reanahub / reana-job-controller

REANA Job Controller
http://reana-job-controller.readthedocs.io/
MIT License
2 stars 38 forks source link

RFC job cache #44

Open tiborsimko opened 7 years ago

tiborsimko commented 7 years ago

It'll be useful (someday) to have a central server-side job cache that could speed up the rerunning of workflows. (And for sharing results among workflows that start with the same initial steps.)

Each job execution run could optionally store results under:

containing say SHA1 information about the input file and parameters, the container environment used, the steps used, and that would store any desired output files of the step command there.

If another job comes and uses the same environment image and the same input file and parameters, then the job execution could quickly return the pre-cached job output.

This could be implemented at the level of the workflow.

lukasheinrich commented 7 years ago

yadage already has infrastructure in place to cache jobs, with pluggable mechanisms on how to validate/update the cache, so we could develop a custom cache plugin for reana. The main issue is how to deal with changing absolute paths to e.g. input files, etc.

lukasheinrich commented 5 years ago

@tiborsimko should we try to spec out what a engine-independent cache would look like?

tiborsimko commented 5 years ago

@lukasheinrich Yes, let's! The best after v0.5.0 is over.