jataware / dojo

5 stars 0 forks source link

Register Dataset: Transform Jobs use different endpoint to await results #200

Open ccjoel opened 11 months ago

ccjoel commented 11 months ago

Summary

During Dataset Registration:

Handling job status and errors has been problematic on the Dataset Transformation step.

Reason:

The UI fetches the status/result of the Dataset Transform jobs using a different endpoint than the rest of the jobs (ie compared to file_conversion, run_elwood, etc), which adds limitations on error handling. The Dojo UI is using /job/fetch/id call to await for the jobs to finish, instead of the /job/{dataset_id}/{job_name} endpoint, like the rest of the jobs in the flow.

The /job/fetch endpoint is intended to fetch the final result, but does not contain data on the job status (is it stuck in queued? started? finished ? error?).

We're not sure if this was an oversight, or if there was a limitation using the previous patterns. We can look into this, with the possibility to increase stability on this page if we replace the endpoints/handling on this step.

Proposed Solution

Use new GET /job/uuid/job_name endpoint, added in #201 .

We should migrate to using the new endpoints instead of the /job/fetch ones, in order to follow the job status, alongside the result. These new GET, side-effects-free, endpoints to the Dojo API should allow us to better handle errors on the Dataset Transform page.

Changing the endpoint called from UI is trivial, but changing how we handle the result requires additional code changes- the result won't be the main response body, but nested; the status property will indicate if the job is queued, started, or if if it has finished- as well as contain the error if it has failed.

More Details | Noise

On the Dataset Register page, we have multiple long-running tasks, which we queue to await for them to finish.

For all the register-related jobs, we call this endpoint to enqueue them:

POST /job/id/task_name.fn_name

Example:

POST /api/dojo/job/c4a295cc-4639-4180-928a-11bba2d71ac7/file_processors.file_conversion

For this same endpoint, we await for the result using the same url:

POST /api/dojo/job/c4a295cc-4639-4180-928a-11bba2d71ac7/file_processors.file_conversion

For the dataset transformation page, the latter job-await endpoint is replaced with the following pattern:

POST /job/fetch/c4a295cc-4639-4180-928a-11bba2d71ac7_resolution_processors.calculate_geographical_resolution

Note the /fetch/ in the path, as well as the underscore _ that merges the uri path of the dataset uuid and job_name.

adamgilg commented 9 months ago

Re https://github.com/jataware/dojo/issues/202 and this: I explored using GET /job/{uuid}/{job_name} instead of GET /job/fetch for the Data Modeling processing step. What I found was that the current use of RQ (I believe these come from RQ) statuses are not consistent enough to make this helpful. For example, when the flowcast job returns an error, it uses the status finished, but the results object is populated with { error: '', message... }.

If we want to make a wholesale switch to using this endpoint (which I'm in support of, as I think it will make things more standardized and clearer), we should also put in some effort to make upstream changes to have consistent statuses returned by the various jobs. This may also involve some work on the Flowcast and Elwood jobs to raise errors appropriately and then respond with the correct statuses, as well as making sure these stay consistent throughout the entire flow.