open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.26k stars 994 forks source link

DBT Cloud time format mismatch #17905

Closed aldwyn closed 4 days ago

aldwyn commented 1 week ago

Affected module DBT Cloud connector

Describe the bug We are running the DBT Cloud connector ingestion externally. The workflow yaml config is correct, but the workflow has this error in the logs:

Wild error ingesting pipeline status id=651752 name='snow' description='Run dbt models for ServiceNow data source' created_at='2024-06-06 03:10:17.334307+00:00' updated_at='2024-06-21 03:01:05.544477+00:00' state=1 job_type='other' schedule=DBTSchedule(cron='7 */12 * * 0,1,2,3,4,5,6') project_id=369331 - time data '2024-06-06 03:10:17.334307+00:00' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'

Stacktrace

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/metadata/ingestion/source/pipeline/dbtcloud/metadata.py", line 316, in yield_pipeline_status
    datetime.strptime(
  File "/usr/local/lib/python3.11/_strptime.py", line 567, in _strptime_datetime 
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "/usr/local/lib/python3.11/_strptime.py", line 349, in _strptime 
    raise ValueError("time data %r does not match format %r" % 
ValueError: time data '2024-06-06 03:04:20.716679+00:00' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'

To Reproduce

  1. Create the following config yml as dbtcloud_metadata.yml:

      source:
        type: dbtcloud
        serviceName: DBTCloud-{{ env('DBT_CLOUD_ACCOUNT_ID') }}
        serviceConnection:
          config:
            type: DBTCloud
            host: "https://cloud.getdbt.com/"
            discoveryAPI: "https://metadata.cloud.getdbt.com/graphql"
            accountId: "{{ env('DBT_CLOUD_ACCOUNT_ID') }}"
            # jobId: "numeric_job_id"
            token: "{{ env('DBT_CLOUD_AUTH_TOKEN') }}"
        sourceConfig:
          config:
            type: PipelineMetadata
            lineageInformation:
              dbServiceNames: ["Snowflake-{{ env('SNOWFLAKE_ACCOUNT') }}"]

    Make sure to replace the envvars.

  2. Execute the OM CLI with metadata ingest -c dbtcloud_metadata.yml.

Expected behavior

It should run without errors as the timestamp formats are maintained by DBT Cloud.

Version:

Additional context Add any other context about the problem here.

sushi30 commented 1 week ago

The issue is reproducible in sandbox beta (https://sandbox-beta.open-metadata.org/pipelineServices/dbt_cloud_test.2c659b21-006b-4aa4-bace-4264da38b4c6/logs)