open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.26k stars 994 forks source link

MINOR - fix Data Quality 1.6 migration #17966

Closed TeddyCr closed 3 days ago

TeddyCr commented 3 days ago

Describe your changes:

Fixes Data Quality Migrations in 1.6

In 1.6. migration we are making test case results an EntityTimeSeries Interface and assigning a UUID (+ the test case fqn reference) like so

-- Add FQN and UUID to data_quality_data_time_series records
UPDATE openmetadata_db.data_quality_data_time_series dqdts
INNER JOIN openmetadata_db.test_case tc ON dqdts.entityFQNHash = tc.fqnHash
SET dqdts.json = JSON_SET(dqdts.json,
    '$.testCaseFQN', tc.json->'$.fullyQualifiedName',
    '$.id', (SELECT UUID())
);

I believe at one point we did not cascade deletion of test case results when deleting a test case (or other parents) -- as you can see we fetch the results by looking at the test case fqn (hashed).

return JsonUtils.readValue(
        daoCollection
            .dataQualityDataTimeSeriesDao()
            .getLatestExtension(testCase.getFullyQualifiedName(), TESTCASE_RESULT_EXTENSION),
        TestCaseResult.class);
  }

Based on this, I believe we should be safe to delete the test case results with no matching test case e.g:

SELECT DISTINCT dqdts.entityFQNHash
FROM openmetadata_db.data_quality_data_time_series dqdts
LEFT JOIN openmetadata_db.test_case tc ON dqdts.entityFQNHash = tc.fqnHash
WHERE tc.fqnHash IS NULL;

#

Type of change:

#

Checklist: