In some workflows, ISI & collaborators use docker images to provide always-on services (such as a DB). The default position for Pegasus launching a docker image as part of a job only has the image survive the length of the job, instead if the image needs to persist until potentially the end of all documents for processing then the default process will not work. Alternatively, a bash script can be launched as a pegasus job which handles starting up the docker image then a second script shutdown the image at the end.
To enable this approach a few modifications are needed:
[ ] Scripts to start and stop the images (based on examples from @elizlee)
[ ] Support to stage files into and out of the dockermount
[ ] Friction-free submission of jobs running in the container:
Bash or Python jobs use a base transformation (e.g. /usr/bin/python3) where the arguments invoke the python scripts needed to run
Automatically generate required jobs to stage inputs and outputs without requiring the user to implement them by hand
[ ] Change the default configuration type from "sharedfs" if a container is being used
We also don't want to entirely remove the ability to service a docker image for a one-time job, so care should be taken to configure the settings appropriately.
These changes are easier if file input/output management can be ignored on the wrapper backend for the moment as currently, we don't have a good way to handle files that are tracked in parameter files for python jobs. @elizlee -- Would your desired use case for this addition need to be able to input / output files from the active container? Alternatively, if I just extend support for manual handling of files at the moment would that be ok? Then I could push automatic file handling to when the wrapper gets revamped to support files as inputs and outputs via Pegasus rather than our own built-in assumption of running on a shared NAS.
In some workflows, ISI & collaborators use docker images to provide always-on services (such as a DB). The default position for Pegasus launching a docker image as part of a job only has the image survive the length of the job, instead if the image needs to persist until potentially the end of all documents for processing then the default process will not work. Alternatively, a bash script can be launched as a pegasus job which handles starting up the docker image then a second script shutdown the image at the end.
To enable this approach a few modifications are needed: [ ] Scripts to start and stop the images (based on examples from @elizlee) [ ] Support to stage files into and out of the dockermount [ ] Friction-free submission of jobs running in the container:
/usr/bin/python3
) where the arguments invoke the python scripts needed to run[ ] Change the default configuration type from "sharedfs" if a container is being used
We also don't want to entirely remove the ability to service a docker image for a one-time job, so care should be taken to configure the settings appropriately.
These changes are easier if file input/output management can be ignored on the wrapper backend for the moment as currently, we don't have a good way to handle files that are tracked in parameter files for python jobs. @elizlee -- Would your desired use case for this addition need to be able to input / output files from the active container? Alternatively, if I just extend support for manual handling of files at the moment would that be ok? Then I could push automatic file handling to when the wrapper gets revamped to support files as inputs and outputs via Pegasus rather than our own built-in assumption of running on a shared NAS.