open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.62k stars 1.06k forks source link

Not correct owner for tables when running dbt ingestion and similar team/user names #17510

Open nicor88 opened 3 months ago

nicor88 commented 3 months ago

Affected module I believe that this issue is due to the ingestion.

Describe the bug I have few departments:

When I run dbt ingestion, the correspondent owner for tables that have dbt owner set to data-engineering is DataInsightsApplicationBot. I expect that no owner is set, or one of the 2 department is set.

To Reproduce

Expected behavior A clear and concise description of what you expected to happen.

Version:

Additional context After a chat with Pablo Takara/Sriharsha Chintalapani/Onkar Ravgan looks like that data-engineering cannot be assigned as an owner because it's a departement, which is fine, but no owner should be assigned at all. Seems like that the issue is due to name_search_query_es where a fuzzy matching it's performed, leading to match DataInsightsApplicationBot (the closest to data-engineering). I expect that:

nicor88 commented 2 months ago

Here some findings regarding this issue. In my understanding the issue seems related to the fact that - is used on the dbt_owner, therefore when name_search_query_es run, it evaluates as search for data without engineering, in my understanding the - is evaluated as a operator to tell Opensearch/Elasticsearch that should exclude some terms.

But what is weird, is that if I use data.engineering, therefore a . separator, seems to behave as -, and the bot is returned datainsightsapplicationbot

Also, worth to mention, that I tested this beavior in 1.4.7 and 1.5.0 dbt ingestion.

One of the solution that I'm adopting is to use "names" without a separator, because anyhow the Display name of a team can show what I want. In my case therefore the team name will be dataengineering or dataanalytics.