Describe in detail what your merge request does and why.
Fixes #513
Adds retry and error rescue to jobs that connect to ga4gh wes. Details below.
Fixes a race condition bug where a job that a user cancelled sometimes finishes and completes(writes to samples) instead of cancelling.
Created a new set of helper functions ActiveJobTestHelpers to allow mocking Faraday based API integrations for active job tests.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other pull requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
You will need to have Sapporo running and connected to test
The following 3 jobs reach out to ga4gh wes.
WorkflowExecutionStatusJob
WorkflowExecutionSubmissionJob
WorkflowExecutionCancelationJob
If ga4gh wes is unreachable or returns an error code, it must be handled so that the workflow execution doesn't get stuck in whatever state it was in when the error occurred.
For ConnectionError type
This error type should always be reattempted.
You can simply turn off your Sapporo instance at the different stages of the workflow execution.
Verify that the job is reattempted continually.
For APIExceptionError types
These error types should be attempted 5 times before being put into the error state
You can make workflow_execution services throwto test the job re-queuing by adding a line to throw the exception. For example:
WorkflowExecutionCancelationJob is a special case because ga4gh wes can return 401 and/or 403 errors when a cancel request is sent to a completed run (Sapporo always replies with 200).
In this case, we want to mark those runs as canceled and not move on to completion steps. Other errors should still behave as above.
PR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
What does this PR do and why?
Describe in detail what your merge request does and why. Fixes #513
Adds retry and error rescue to jobs that connect to ga4gh wes. Details below.
Fixes a race condition bug where a job that a user cancelled sometimes finishes and completes(writes to samples) instead of cancelling.
Created a new set of helper functions
ActiveJobTestHelpers
to allow mocking Faraday based API integrations for active job tests.Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other pull requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
You will need to have
Sapporo
running and connected to testThe following 3 jobs reach out to ga4gh wes.
If ga4gh wes is unreachable or returns an error code, it must be handled so that the workflow execution doesn't get stuck in whatever state it was in when the error occurred.
For ConnectionError type
This error type should always be reattempted.
You can simply turn off your Sapporo instance at the different stages of the workflow execution.
Verify that the job is reattempted continually.
For APIExceptionError types
These error types should be attempted 5 times before being put into the error state
You can make workflow_execution services throwto test the job re-queuing by adding a line to throw the exception. For example:
WorkflowExecutionCancelationJob
WorkflowExecutionCancelationJob is a special case because ga4gh wes can return 401 and/or 403 errors when a cancel request is sent to a completed run (Sapporo always replies with 200).
In this case, we want to mark those runs as
canceled
and not move on to completion steps. Other errors should still behave as above.PR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.