Open joestubbs opened 2 years ago
While I believe it's possible to use docker within docker, because docker runs with root privileges, it seems that singularity purposefully disallows this by design.
Another idea which doesn't require implementing a workflow language is to let the user run consecutive jobs that share the same scratch directory. That is, run a job, then leave the scratch directory and all of its files in place, when running a second job, specify the same scratch directory so the second job has access to all the files left over from the first job, and so forth.
I haven't looked deeply enough into v3/jobs to see if this is currently possible. I suppose the user could specify the scratch directory to be used instead of it being automatically determined by Tapis.
@richcar58 : Can you add a pointer here to the zip runtime proposal/implementation documentation?
Yes, it's possible to arrange for the output files of one Tapis v3 job to be used as input into one or more subsequent jobs. The trick will be settling on a job launch strategy--when will jobs later in a workflow's execution get launched? Those latter jobs, for example, could get launched upon receipt of a Tapis notification that a previous job has completed. This orchestrator approach requires workflow knowledge to be externalized and really is a type of rudimentary workflow manager.
Another approach would be to bake into the jobs themselves a monitoring or polling capability. A simple monitoring approach would have jobs wait for particular files to appear (or disappear, become unlocked, etc.) before proceeding. In this scenario, all jobs can be launched simultaneously and each would only start executing when its inputs became available.
Currently under development is support for a new Tapis runtime call the Zip Runtime. This support allows zip or tar.gz archive files to be treated as a type of "image" in application definitions and in the Jobs service. Jobs will stage the archive file, unpack it, execute a specified executable and monitor that executable until it reaches a terminal state. Input file staging and output file archiving work the same as in all other Tapis supported runtimes.
The idea is that users will have complete freedom to include whatever they need in their archives and can run whatever commands their host account permits. Certain conventions need to be observed to interact successfully with Tapis, but other than that workflows can be encapsulated in an archive. Executions can be reproducible by versioning and documenting the archive files.
The ZIP Runtime looks good and somewhat like V2 which took a directory tree from a storage system, zipped and versioned it when an app was published. If I understand the design, the app entry point will be a BASH script that is outside of a singularity container?
By default, _tapisjobapp.sh will be run and it typically would be a BASH script. Otherwise, the tapisjob.manifest would contain the pathname of the executable to be run (any executable will do).
Some research workloads are composed of multiple, individual steps submitted as a single HPC batch job. In the simplest cases, the entry point to the job is a parent script that starts up the individual steps as threads or processes. These individual steps could run sequentially or in parallel, on the same compute node or across additional nodes.
For various reasons, it can be more computationally efficient to submit the workload as a single batch job -- for example, if the individual steps will share memory or files that need to be staged to/from the compute environment. Additionally, some individual steps do not make sense to run "standalone" or without having executed previous steps. For these reasons, it would be ideally for the "multi-step" application to be registered as a single Tapis application.
There are many tools that support developers building and executing multi-step applications. Here are just a few:
There are at least two challenges for making applications like the above work with Tapis apps.
Even if executing containers from within running containers can be made to work, it is not clear how that approach would help with cases like Launcher.