sweepai-dev / elementary

Open-source data observability for analytics engineers.
https://docs.elementary-data.com
Apache License 2.0
0 stars 0 forks source link

Sweep: [ELE-996] Defining `quote: true` on columns fails the column anomalies tests #4

Closed wwzeng1 closed 1 year ago

wwzeng1 commented 1 year ago

Describe the bug We have a check here in which we read the table's columns and validate that the column you're testing actually exists. We're making the comparison with a quoted column on one side and an unquoted one on the other and hence the issue.

To Reproduce Add quote: true to a column.

Solution We can solve this by doing one of two things:

  1. Remove this check. The test will crash regardless if the column doesn't exist, however it'll be less indicative that this is the issue.
  2. Unquote the columns upon testing.
image

Additional context

ELE-996

sweep-ai[bot] commented 1 year ago

Here's the PR! https://github.com/sweepai-dev/elementary/pull/5.

💎 Sweep Pro: I used GPT-4 to create this ticket. You have 6 GPT-4 tickets left.


Step 1: 🔍 Code Search

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/sweepai-dev/elementary/blob/1b8bc9005e9a4324b0c2464d515f14d29073d139/docs/tutorial/running-elementary.mdx#L5-L71 https://github.com/sweepai-dev/elementary/blob/1b8bc9005e9a4324b0c2464d515f14d29073d139/tests/e2e/report/test_report.py#L113-L359 https://github.com/sweepai-dev/elementary/blob/1b8bc9005e9a4324b0c2464d515f14d29073d139/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx#L1-L69 https://github.com/sweepai-dev/elementary/blob/1b8bc9005e9a4324b0c2464d515f14d29073d139/docs/guides/anomaly-detection-tests/column-anomalies.mdx#L1-L71 https://github.com/sweepai-dev/elementary/blob/1b8bc9005e9a4324b0c2464d515f14d29073d139/docs/guides/add-schema-tests.mdx#L1-L134

Step 2: 🧐 Snippet Analysis

From looking through the relevant snippets, I decided to make the following modifications:

File Path Proposed Changes
elementary/monitor/dbt_project/macros/tests/test_column_anomalies.sql Modify the check that validates if a column exists in the table. Specifically, unquote the column names before testing. This can be done by using the UNQUOTE_IDENTIFIER function provided by dbt. Replace the existing check with UNQUOTE_IDENTIFIER(column_name) IN (SELECT column_name FROM columns).

Step 3: 📝 Planning

I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:

[ELE-996] Fix column anomalies test for quoted column names sweep/fix-column-anomalies-test

Description

This PR addresses issue ELE-996.

The issue occurs when the column names are quoted, causing a mismatch with the unquoted column names in the test_column_anomalies.sql file. This PR modifies the check in the test_column_anomalies.sql file to unquote the column names before testing. This ensures that the column names match regardless of whether they are quoted or not in the table.

Changes Made

  • Modified the check in test_column_anomalies.sql to unquote the column names before testing.
  • Replaced the existing check with UNQUOTE_IDENTIFIER(column_name) IN (SELECT column_name FROM columns).

Screenshots

N/A

Additional Context


Step 4: ⌨️ Coding

I have finished coding the issue. I am now reviewing it for completeness.


Step 5: 🔁 Code Review

Success! 🚀


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind! Join Our Discord