waikato-ufdl / ufdl-backend

User-Friendly Deep Learning (UFDL) - backend system.
Apache License 2.0
1 stars 0 forks source link

Job launchers to release previous jobs on startup #85

Closed fracpete closed 4 years ago

fracpete commented 4 years ago
csterling commented 4 years ago

If a node crashes before it finishes a job, it still owns the job, so can query for its current job and then finalise/reset it.

fracpete commented 4 years ago

That could work. I presume, I would use job.list(...) with an appropriate filter to locate jobs for the node and then call job.reset_job(...). What would the filter expression look like?

csterling commented 4 years ago

Do an exact filter on the node field with the pk of the node. I.e.

{
    "expressions": [
        {
            "type": "exact",
            "field": "node",
            "value": 1
        }
    ]
}
fracpete commented 4 years ago

Implemented.