Closed blarghmatey closed 1 year ago
The issue with the create table permissions was due to a child role having a deny statement for creating tables which propagated upward. For now that permission setting has been removed to unblock the dbt runs.
The most recent dbt build output is:
Finished running 209 table models, 1651 tests, 10 view models in 0 hours 41 minutes and 52.53 seconds (2512.53s).
Completed with 4 errors and 0 warnings:
Database Error in model stg__micromasters__app__postgres__grades_combinedcoursegrade (models/staging/micromasters/stg__micromasters__app__postgres__grades_combinedcoursegrade.sql)
TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Failed communicating with server: https://mitol.galaxy.starburst.io/api/v1/galaxy/security/trino/entity/table/c-4004614063/ol_warehouse_production_staging/stg__micromasters__app__postgres__grades_combinedcoursegrade__dbt_tmp/:create", query_id=20230508_172400_77476_6xy4u)
compiled Code at target/run/open_learning/models/staging/micromasters/stg__micromasters__app__postgres__grades_combinedcoursegrade.sql
Database Error in model int__mitxpro__b2becommerce_b2border (models/intermediate/mitxpro/int__mitxpro__b2becommerce_b2border.sql)
TrinoUserError(type=USER_ERROR, name=TABLE_NOT_FOUND, message="line 20:19: Table 'ol_data_lake_production.ol_warehouse_production_intermediate.int__salesforce__opportunity' does not exist", query_id=20230508_174508_00203_6xy4u)
compiled Code at target/run/open_learning/models/intermediate/mitxpro/int__mitxpro__b2becommerce_b2border.sql
Failure in test not_null_int__combined__courserun_enrollments_courserun_id (models/intermediate/combined/_combined_models.yml)
Got 12600270 results, configured to fail if != 0
compiled Code at target/compiled/open_learning/models/intermediate/combined/_combined_models.yml/not_null_int__combined__courserun_enrollments_courserun_id.sql
Failure in test not_null_int__combined__courserun_enrollments_courserun_title (models/intermediate/combined/_combined_models.yml)
Got 85666 results, configured to fail if != 0
compiled Code at target/compiled/open_learning/models/intermediate/combined/_combined_models.yml/not_null_int__combined__courserun_enrollments_courserun_title.sql
Done. PASS=1769 WARN=0 ERROR=4 SKIP=97 TOTAL=1870
The most recent production deploy of Dagster/dbt code is May 4th. Deploying the latest as of now and will re-run to see what the error output is afterwards.
https://github.com/mitodl/ol-data-platform/pull/688 is merged, which should fix the first 3 errors in https://pipelines.odl.mit.edu/runs/c822b251-e5c7-4243-a561-7aff967f60ea?logFileKey=xslqhdla
airbyte_asset_sync job ran successfully last night, dbt models test failures are all resolved. I will continue reviewing the job log in dagster, but this can be closed
Job has been running successfully 4 days in a row, but there are Trino Internal errors in today's run
Database Error in model __micromasters_course_certificates_dedp_from_micromasters (models/intermediate/micromasters/subqueries/__micromasters_course_certificates_dedp_from_micromasters.sql)
TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Failed communicating with server: https://mitol.galaxy.starburst.io/api/v1/galaxy/security/trino/entity/table/c-4004614063/ol_warehouse_production_intermediate/__micromasters_course_certificates_dedp_from_micromasters__dbt_tmp/:create", query_id=20230516_090255_15910_3thya)
compiled Code at target/run/open_learning/models/intermediate/micromasters/subqueries/__micromasters_course_certificates_dedp_from_micromasters.sql
I also encountered a similar error when testing locally
13:32:03 Runtime Error in model stg__micromasters__app__postgres__courses_program (models/staging/micromasters/stg__micromasters__app__postgres__courses_program.sql)
13:32:03 Runtime Error
13:32:03 TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Unexpected response status (Bad Gateway) performing operation: entity renamed
13:32:03 502 Bad Gateway
13:32:03 Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared
13:32:03 ", query_id=20230516_131344_85991_3thya)
13:32:03
13:32:03 Done. PASS=32 WARN=0 ERROR=1 SKIP=8 TOTAL=41
It doesn't seem like data or dbt error, @blarghmatey @quazi-h do you have any idea what it might be?
The job ran fine and the models materialized this morning. @blarghmatey do we need to investigate the intermittent GENERIC_INTERNAL_ERROR
?
I upgraded the production cluster to be a paid tier, and running in fault-tolerant mode which seems to have resolved these errors.
Description/Context
The dbt runs in production are producing a handful of errors that should be cleaned up. The error log is:
Acceptance Criteria