Closed brandon-segal closed 1 year ago
hey @brandon-segal ! Could you share a flyte execution with this error to check exactly the flye error?
@andresgomezfrr
flytectl get execution -p dataplatform-insights-pipelines -d production xxxxxxxxxxxxxxx -o yaml
closure:
createdAt: "2023-05-28T02:32:45.280650787Z"
duration: 293.582359335s
error:
code: RetriesExhausted|USER:NotReady
kind: USER
message: |-
[1/1] currentAttempt done. Last Error: USER::Traceback (most recent call last):
File "/usr/src/app/.venv/lib/python3.8/site-packages/flytekit/exceptions/scopes.py", line 203, in user_entry_point
return wrapped(*args, **kwargs)
File "/usr/src/app/.venv/lib/python3.8/site-packages/spotify_dbt_flytekit/tasks/dbt_task.py", line 42, in wrapper
handle_dbt_flyte_errors(out)
File "/usr/src/app/.venv/lib/python3.8/site-packages/spotify_dbt_flytekit/clients/dbt_cli/handle_errors.py", line 22, in handle_dbt_flyte_errors
raise FlyteMissingDependencyException(
Message:
('Missing Dependency in DBT Script', 'model.dataplatform_insights.stg_cp__ui_components', 'error', 'Compilation Error in model stg_cp__ui_components (models/staging/client-platform/stg_cp__ui_components.sql)\n 404 Error: partition not found for:`client-platform-insights-1`.`ui`.`ui_components` for partition 2023-05-26 00:00:00\n \n > in macro check_dependencies (macros/dependencies/check_dependencies.sql)\n > called by macro run_hooks (macros/materializations/hooks.sql)\n > called by macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)\n > called by macro materialization_view_bigquery (macros/materializations/view.sql)\n > called by model stg_cp__ui_components (models/staging/client-platform/stg_cp__ui_components.sql)')
User error.
phase: FAILED
startedAt: "2023-05-28T02:32:50.396389232Z"
stateChangeDetails:
occurredAt: "2023-05-28T02:32:45.280650787Z"
updatedAt: "2023-05-28T02:37:43.978748335Z"
workflowId:
domain: production
name: dataplatform_insights_pipelines.workflows.dynamic_dbt_build.dynamic_dbt_build
project: dataplatform-insights-pipelines
resourceType: WORKFLOW
version: 30286103-da38-417b-b774-2498293066a8
id:
domain: production
name: kgdjf7mhot5bglibbpwr
project: dataplatform-insights-pipelines
spec:
annotations:
values:
STYX_COMPONENT_ID: dataplatform-insights-pipelines
STYX_EXECUTION_ID: styx-run-30d25fb0-ee9f-4265-a021-7da312b45a59
STYX_PARAMETER: "2023-05-26"
STYX_TRIGGER_ID: natural-trigger
STYX_TRIGGER_TYPE: natural
STYX_WORKFLOW_ID: dataplatform-insights-pipelines.production.dataplatform_insights_dbt
styx-execution-id: styx-run-30d25fb0-ee9f-4265-a021-7da312b45a59
styx-workflow-instance: dataplatform-insights-pipelines#dataplatform-insights-pipelines.production.dataplatform_insights_dbt#2023-05-26
labels:
values:
STYX_COMPONENT_ID: dataplatform-insights-pipelines
STYX_EXECUTION_ID: styx-run-30d25fb0-ee9f-4265-a021-7da312b45a59
STYX_PARAMETER: "2023-05-26"
STYX_TRIGGER_ID: natural-trigger
STYX_TRIGGER_TYPE: natural
STYX_WORKFLOW_ID: dataplatform-insights-pipelinesproductiondataplatform_in6f5f0c2
declarative-project-namespace: dataplatform-insights
ghe-org: dataplatform-insights
ghe-repo: dataplatform-insights-pipelines
launchPlan:
domain: production
name: dataplatform_insights_dbt_lp
project: dataplatform-insights-pipelines
resourceType: LAUNCH_PLAN
version: 30286103-da38-417b-b774-2498293066a8
metadata:
mode: SCHEDULED
systemMetadata:
executionCluster: flyte-production-regional
securityContext:
runAs: {}
Description
While working with the deployment of the dynamic workflow, it was found that the workflow would result return an error code of
RetriesExhaused|User:NotReady
when there was a dependency missing instead ofUser:NotReady
, which is the typical error code when a dependency was missing. Styx uses these error codes returned by Flyte to determine what status the Styx workflow instance should be, and if it is User:NotReady, the system will return a 20 error code for a missing dependency. (relevant code) With the Flyte team's help, the issue could be tracked to a set of locations in the Flyte propeller code.Ideal Behavior
Styx returns a Missing Dependencies error code when the error code contains
User:NotReady
Current Behavior
Styx returns an unknown error error code when the error code is not exactly
User:NotReady
Possible Cause
Within the Flyte propeller code base, it was found that dynamic workflows will raise a RetryableFailure status if any dynamically generated nodes fail (relevant code). Once this status is raised for the dynamic workflow, the Flyte propeller will prepend the error code with
RetriesExhaused|
before the dynamic node's original error code (relevant code).The impact is that any dynamic workflow cannot raise a
User:NotReady
in a way Styx can identify. This will result in erroneously labeling workflows as having unknown errors when the team may be raising error codes known to the Styx service but not recognized due to the RetriesExhaused string prepended to it.Suggested Remediation
A possible remediation to this to allow dynamic workflows to raise specific Styx errors is to remove the
RetriesExhaused|
String before matching it to any of the known error codes.