A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.
Describe the bug
When the 04-Infant pipeline sets the pipeline result, if there is a connectivity issue and the store proc call to set the pipeline times out, the PipelineStatus in the CurrentExecution table remains 'Running'. When the next stage starts, the pipeline with the status of running does not block execution of pipelines that depend on it.
Affected services
Data Factory/Synapse
SQL Database
To Reproduce
This is difficult to reproduce, as it's been due to Azure connectivity errors when we have seen it. It can be simulated by changing the proc to return early for a specific worker pipeline.
Expected behaviour
If a pipeline status cannot be set successfully, this should be cleaned up before the next stage starts. The status should be error, as the exact outcome of the worker pipeline cannot be determined. The 04-worker pipeline should also retry setting the result, as it seems like this is most likely a small connectivity blip, and other worker pipelines finishing within a few seconds have not suffered the same issue.
Screenshots
Additional context
I have fixed this by reducing the time to timeout (as it is a simple proc call that should not block for long), allowing 2 retries, and setting the time between retries to 5 seconds. I have also changed the proc procfwk.CheckForBlockedPipelines to add a check for pipelines in prior stages with a status of running, and raise errors for them (which sets the status too). The normal blocked pipeline logic then runs, and the framework continues according to the error handling mode.
Describe the bug When the 04-Infant pipeline sets the pipeline result, if there is a connectivity issue and the store proc call to set the pipeline times out, the PipelineStatus in the CurrentExecution table remains 'Running'. When the next stage starts, the pipeline with the status of running does not block execution of pipelines that depend on it.
Affected services
To Reproduce This is difficult to reproduce, as it's been due to Azure connectivity errors when we have seen it. It can be simulated by changing the proc to return early for a specific worker pipeline.
Expected behaviour If a pipeline status cannot be set successfully, this should be cleaned up before the next stage starts. The status should be error, as the exact outcome of the worker pipeline cannot be determined. The 04-worker pipeline should also retry setting the result, as it seems like this is most likely a small connectivity blip, and other worker pipelines finishing within a few seconds have not suffered the same issue.
Screenshots
Additional context I have fixed this by reducing the time to timeout (as it is a simple proc call that should not block for long), allowing 2 retries, and setting the time between retries to 5 seconds. I have also changed the proc procfwk.CheckForBlockedPipelines to add a check for pipelines in prior stages with a status of running, and raise errors for them (which sets the status too). The normal blocked pipeline logic then runs, and the framework continues according to the error handling mode.