open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.28k stars 997 forks source link

Failure ingesting Airflow pipeline when the owner doesn't exist #16590

Closed ajsquared closed 2 months ago

ajsquared commented 3 months ago

Affected module Ingestion Framework

Describe the bug I see this error in an Airflow pipeline ingestion when a DAG is owned by a user that does not exist in OpenMetadata

[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest CreatePipelineRequest [redacted_dag_name] due to api request failure: Team of type Organization can't own entities. Only Team of type Group can own entities.
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
[2024-06-07, 09:20:46 PDT] {status.py:76} WARNING - Failed to ingest Pipeline Status [Airflow.redacted_dag_name] due to api request failure: pipeline instance for Airflow.redacted_dag_name not found
2024-06-07, 12:26:49 PDT] {topology_runner.py:231} DEBUG - Processing stage: type_=<class 'metadata.ingestion.models.ometa_classification.OMetaTagAndClassification'> processor='yield_tag' nullable=True must_return=False overwrite=True consumer=None context='tags' store_all_in_context=False clear_context=False store_fqn=False cache_entities=False use_cache=False
[2024-06-07, 12:26:49 PDT] {topology_runner.py:231} DEBUG - Processing stage: type_=<class 'metadata.generated.schema.entity.data.pipeline.Pipeline'> processor='yield_pipeline' nullable=False must_return=False overwrite=True consumer=['pipeline_service'] context='pipeline' store_all_in_context=False clear_context=False store_fqn=False cache_entities=False use_cache=True
[2024-06-07, 12:26:49 PDT] {metadata_rest.py:135} DEBUG - Processing Create request <class 'metadata.generated.schema.api.data.createPipeline.CreatePipelineRequest'>
[2024-06-07, 12:26:49 PDT] {status.py:76} WARNING - Failed to ingest CreatePipelineRequest [redacted_dag_name] due to api request failure: Team of type Organization can't own entities. Only Team of type Group can own entities.
[2024-06-07, 12:26:49 PDT] {status.py:77} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 219, in _one_request
    resp.raise_for_status()
  File "/home/airflow/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://openmetadata-prod.openmetadata.svc.cluster.local:8585/api/v1/pipelines
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/sink/metadata_rest.py", line 145, in _run
    return self._run_dispatch(record)
  File "/usr/local/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/sink/metadata_rest.py", line 136, in _run_dispatch
    return self.write_create_request(record)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/sink/metadata_rest.py", line 166, in write_create_request
    created = self.metadata.create_or_update(entity_request)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/ometa_api.py", line 276, in create_or_update
    return self._create(data=data, method="put")
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/ometa_api.py", line 267, in _create
    resp = fn(self.get_suffix(entity), data=data.json(encoder=show_secrets_encoder))
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/execution_time_tracker.py", line 195, in inner
    result = func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 298, in put
    return self._request("PUT", path, data)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 193, in _request
    return self._one_request(method, url, opts, retry)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 237, in _one_request
    raise APIError(error, http_error) from http_error
metadata.ingestion.ometa.client.APIError: Team of type Organization can't own entities. Only Team of type Group can own entities.

To Reproduce

Expected behavior The missing owner should not prevent ingesting the DAG entirely. Instead it should be ingested with no owner, or some default owner.

Version:

harshach commented 3 months ago

@ulixius9 we need to investigate how this is default to Organization as the owner? can you assign it to someone in the team.