sodadata / soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
https://go.soda.io/core-docs
Apache License 2.0
1.9k stars 208 forks source link

Test contract with valid_values_reference_data #2155

Open aixulin opened 2 months ago

aixulin commented 2 months ago

When I use valid_values_reference_data, there are two issues. The first issue is: When the value in column country does not exist in column id, setting a must_be_less_than condition will also result in a failure.

dataset: dim_employee
columns:
- name: country
  checks:
  - type: invalid_percent
    must_be_less_than: 3
    valid_values_reference_data: 
      dataset: table_b
      column: id

the printf log like this

Errors:
  error |  sodacl: Invalid reference check configuration key identity 
CheCK FAILED
  Expected invalid_ count(country)< 3
  Actual invalid_count(country) was 2

The second issue is: when all the value of column country in column id value, checked result all passed ,but the contract result logs still has err like this

Errors:
  error |  SodaCL: Invalid reference check configuration key identity 

at sodal_parse.py check

            for configuration_key in check_configurations:
                if configuration_key not in [NAME, WARN, FAIL, ATTRIBUTES]:
                    self.logs.error(f"Invalid freshness configuration key {configuration_key}", location=self.location)

the configuration_key is identity

Will this issue be resolved in a future version?

tools-soda commented 2 months ago

CLOUD-8294