Closed fcasson closed 4 years ago
The difficulties with this are:
One way around this is to make use of HTCondor's leave_in_queue
. I did a little test submitting a job where the submit script contained:
+ProminenceOutputTransferred = False
leave_in_queue = JobStatus == 4 && ProminenceOutputTransferred =?= False
Once the job had completed it stayed in the completed state, i.e.
$ condor_q -nobatch
-- Schedd: htcondor-ssl-certs.novalocal : <192.168.251.34:9618?... @ 02/24/20 15:56:24
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
174.0 alahiff 2/24 15:56 0+00:00:01 C 0 0.0 date
condor_qedit
could then be used to change the value of ProminenceOutputTransferred
, i.e.
condor_qedit 174 ProminenceOutputTransferred=True
Once this was done the job exited the queue:
$ condor_history -m 1 174
ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD
174.0 alahiff 2/24 15:56 0+00:00:01 C 2/24 15:56 /usr/bin/date
If I was to implement this in Prominence I would make it optional. The question becomes then - how to trigger this? At the moment the CLI internally does the equivalent of prominence describe
to get the pre-signed URL (amongst other things), and then downloads the file from S3 using this URL.
Maybe the simplest option would be to add:
prominence list
rather than prominence list --completed
)That way it's possible to write a script to download the output then trigger the removal of the job, and we avoid the problem of not knowing whether or not the user has successfully downloaded the output or not.
There probably should be something to automatically remove such jobs from the queue so that users don't just permanently leave them there (e.g. after 1 week or 2 weeks, ...)
Implemented in https://github.com/prominence-eosc/prominence/commit/12be618728e055d994483d10e70c7eba79fad892, https://github.com/prominence-eosc/prominence/commit/4404513d3c7761264702c612df7def874b8deeac, https://github.com/prominence-eosc/prominence/commit/151f76c2676ec4a59358c4b9d35f7e2a076fad19
If a user wants a completed job to stay in the queue until they explicitly want it to leave, add the following to the json job description:
"policies":{
"leaveInQueue": true
}
They then need to do a HTTP PUT to /prominence/v1/jobs/<job id>/remove
which will remove the job from the queue. Note that this can be done for a running or idle job too, in which case when it finishes it will then automatically leave the queue.
Not yet tested or deployed.
Deployed and successfully tested.
Note that jobs don't instantly vanish as soon as the PUT is done - there may be a short delay.
Easiest way to do this to change status to downloaded once output has been downloaded the first time?