sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
59 stars 16 forks source link

Columns not getting excluded Soda Snowflake #178

Open jairamurs opened 2 years ago

jairamurs commented 2 years ago

Although the columns are mentioned in the excluded_columns section they still appear in scan output.I can see them being included in the query, Measurements and tests are also getting executed.

Below is the sample yaml file

table_name: SAMPLEDATA
metrics:
- row_count
- missing_count
- missing_percentage
- values_count
- values_percentage
- valid_count
- valid_percentage
- invalid_count
- invalid_percentage
- min_length
- max_length
- avg_length
- min
- max
- avg
- sum
#- variance
#- stddev
excluded_columns:
- id
tests:
- row_count > 0
columns:
ID:
tests:
- max > 0
SELECT
COUNT(*),
COUNT(CASE WHEN NOT (ID IS NULL) THEN 1 END),
COUNT(CASE WHEN NOT (ID IS NULL) THEN 1 END),
MIN(ID),
MAX(ID),
AVG(ID),
SUM(ID)
FROM SAMPLEDATA
**QUERY Measurements:**
| Query measurement: values_count(ID) = 3206228
| Query measurement: valid_count(ID) = 3206228
| Query measurement: min(ID) = -2016166185
| Query measurement: max(ID) = 269703432
| Query measurement: avg(ID) = -328444222.192943
| Query measurement: sum(ID) = -1053067061633234

Test Execution: | Test column(ID) test(max > 0) passed with measurements {"expression_result": 269703432, "max": 269703432}

vijaykiran commented 2 years ago

@jairamurs Sorry for the delay, In your YAML I see the id is excluded, but ID is still under columns and tests, are you sure that is the exact YAML?


excluded_columns:
  - id
tests:
  - row_count > 0
columns:
  ID:
    tests:
      - max > 0
- ```
jairamurs commented 2 years ago

@vijaykiran the test was mentioned to show that test was getting executed even though the column was excluded. However even without the test mentioned for column level, the metrics are being calculated on table level. You can remove the test on ID column and you will still see the query measeurements being calculated for ID though its mentioned in excluded_columns