open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.67k stars 1.06k forks source link

[Bigquery] Unable to profile partitioned table with policy tag #12878

Closed FelipeArruda closed 11 months ago

FelipeArruda commented 1 year ago

Affected module Ingestion

Describe the bug When running bigquery profile its happening the follow error.

[2023-08-15, 13:46:47 UTC] {profiler_interface.py:56} DEBUG - Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 203, in _execute self._query_job.result() File "/home/airflow/.local/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1499, in result do_get_result() File "/home/airflow/.local/lib/python3.9/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func return retry_target( File "/home/airflow/.local/lib/python3.9/site-packages/google/api_core/retry.py", line 190, in retry_target return target() File "/home/airflow/.local/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1489, in do_get_result super(QueryJob, self).result(retry=retry, timeout=timeout) File "/home/airflow/.local/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py", line 728, in result return super(_AsyncJob, self).result(timeout=timeout, **kwargs) File "/home/airflow/.local/lib/python3.9/site-packages/google/api_core/future/polling.py", line 137, in result raise self._exception google.api_core.exceptions.BadRequest: 400 Data masking cannot be applied to table "dataset.teste_datas_part2" on field "teste_datetime" as the field is used for partitioning or clustering.

To Reproduce

Create a table with partition field then add a policy tag to field in the shema. Run the profile, the above error will happen.

Screenshots or steps to reproduce

Expected behavior Profile table with partitioned field.

Version:

ayush-shah commented 11 months ago

based on the error @FelipeArruda, the data masking on a field used for partitionining or clustering gives bad request. If data masking is crucial for your use case, consider restructuring your table to separate the date component of teste_datetime into a separate field that is not used for partitioning or clustering. You can then apply data masking to this newly created field while keeping the original teste_datetime field for partitioning or clustering.