prominence-eosc / prominence

PROMINENCE server
Apache License 2.0
2 stars 0 forks source link

Differentiate completed and downloaded job status #82

Closed fcasson closed 4 years ago

fcasson commented 5 years ago

Easiest way to do this to change status to downloaded once output has been downloaded the first time?

alahiff commented 4 years ago

The difficulties with this are:

  1. I'm not sure how to reliably determine if a user has successfully downloaded the output
  2. Once a job has completed it's no longer possible to edit it.

One way around this is to make use of HTCondor's leave_in_queue. I did a little test submitting a job where the submit script contained:

+ProminenceOutputTransferred = False
leave_in_queue = JobStatus == 4 && ProminenceOutputTransferred =?= False

Once the job had completed it stayed in the completed state, i.e.

$ condor_q -nobatch

-- Schedd: htcondor-ssl-certs.novalocal : <192.168.251.34:9618?... @ 02/24/20 15:56:24
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 174.0   alahiff         2/24 15:56   0+00:00:01 C  0    0.0 date

condor_qedit could then be used to change the value of ProminenceOutputTransferred, i.e.

condor_qedit 174 ProminenceOutputTransferred=True

Once this was done the job exited the queue:

$ condor_history -m 1 174
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD            
 174.0   alahiff         2/24 15:56   0+00:00:01 C   2/24 15:56 /usr/bin/date

If I was to implement this in Prominence I would make it optional. The question becomes then - how to trigger this? At the moment the CLI internally does the equivalent of prominence describe to get the pre-signed URL (amongst other things), and then downloads the file from S3 using this URL.

Maybe the simplest option would be to add:

That way it's possible to write a script to download the output then trigger the removal of the job, and we avoid the problem of not knowing whether or not the user has successfully downloaded the output or not.

There probably should be something to automatically remove such jobs from the queue so that users don't just permanently leave them there (e.g. after 1 week or 2 weeks, ...)

alahiff commented 4 years ago

Implemented in https://github.com/prominence-eosc/prominence/commit/12be618728e055d994483d10e70c7eba79fad892, https://github.com/prominence-eosc/prominence/commit/4404513d3c7761264702c612df7def874b8deeac, https://github.com/prominence-eosc/prominence/commit/151f76c2676ec4a59358c4b9d35f7e2a076fad19

If a user wants a completed job to stay in the queue until they explicitly want it to leave, add the following to the json job description:

"policies":{
  "leaveInQueue": true
}

They then need to do a HTTP PUT to /prominence/v1/jobs/<job id>/remove which will remove the job from the queue. Note that this can be done for a running or idle job too, in which case when it finishes it will then automatically leave the queue.

Not yet tested or deployed.

alahiff commented 4 years ago

Deployed and successfully tested.

Note that jobs don't instantly vanish as soon as the PUT is done - there may be a short delay.