sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
61 stars 17 forks source link

test error on scanning mysql table #353

Open mrkoloev opened 2 years ago

mrkoloev commented 2 years ago

Describe the bug I run test against mysql table and error occurs:

[TestResult(test=Test(id='{"expression":"row_count > 0"}', title='test(row_count > 0)', expression='row_count > 0', metrics=['row_count'], column=None, source='soda-sql'), passed=False, skipped=False, values=None, error=NameError("name 'row_count' is not defined"), group_values=None), TestResult(test=Test(id='{"column":"bus_no","expression":"invalid_percentage <= 20"}', title='column(bus_no) test(invalid_percentage <= 20)', expression='invalid_percentage <= 20', metrics=['invalid_percentage'], column='bus_no', source='soda-sql'), passed=False, skipped=False, values=None, error=NameError("name 'invalid_percentage' is not defined"), group_values=None), TestResult(test=Test(id='{"column":"incident_number","expression":"invalid_percentage == 0"}', title='column(incident_number) test(invalid_percentage == 0)', expression='invalid_percentage == 0', metrics=['invalid_percentage'], column='incident_number', source='soda-sql'), passed=False, skipped=False, values=None, error=NameError("name 'invalid_percentage' is not defined"), group_values=None), TestResult(test=Test(id='{"column":"incident_number","expression":"missing_count == 0"}', title='column(incident_number) test(missing_count == 0)', expression='missing_count == 0', metrics=['missing_count'], column='incident_number', source='soda-sql'), passed=False, skipped=False, values=None, error=NameError("name 'missing_count' is not defined"), group_values=None), TestResult(test=Test(id='{"column":"school_year","expression":"invalid_percentage == 0"}', title='column(school_year) test(invalid_percentage == 0)', expression='invalid_percentage == 0', metrics=['invalid_percentage'], column='school_year', source='soda-sql'), passed=False, skipped=False, values=None, error=NameError("name 'invalid_percentage' is not defined"), group_values=None), TestResult(test=Test(id='{"column":"schools_serviced","expression":"invalid_percentage <= 15"}', title='column(schools_serviced) test(invalid_percentage <= 15)', expression='invalid_percentage <= 15', metrics=['invalid_percentage'], column='schools_serviced', source='soda-sql'), passed=False, skipped=False, values=None, error=NameError("name 'invalid_percentage' is not defined"), group_values=None)]
Test error for "row_count > 0": name 'row_count' is not defined
Test error for "invalid_percentage <= 20": name 'invalid_percentage' is not defined
Test error for "invalid_percentage == 0": name 'invalid_percentage' is not defined
Test error for "missing_count == 0": name 'missing_count' is not defined
Test error for "invalid_percentage == 0": name 'invalid_percentage' is not defined
Test error for "invalid_percentage <= 15": name 'invalid_percentage' is not defined

To Reproduce I run this python code:

from sodasql.scan.scan_builder import ScanBuilder

scan_builder = ScanBuilder()
scan_builder.scan_yml_file = 'tables/breakdowns.yml'
scan_builder.warehouse_yml_file = 'warehouse.yml'
scan = scan_builder.build()
scan_result = scan.execute()

print(scan_result.test_results)

And if i run this command:

 soda scan warehouse.yml tables/breakdowns.yml

I get this:

 | 2.2.1
  | Scanning tables/breakdowns.yml ...
  | There is no value specified for valid_values for column incident_number
  | There is no value specified for valid_min for column incident_number
  | There is no value specified for valid_max for column incident_number
  | There is no value specified for valid_values for column school_year
  | There is no value specified for valid_min for column school_year
  | There is no value specified for valid_max for column school_year
  | There is no value specified for valid_values for column bus_no
  | There is no value specified for valid_min for column bus_no
  | There is no value specified for valid_max for column bus_no
  | There is no value specified for valid_values for column schools_serviced
  | There is no value specified for valid_min for column schools_serviced
  | There is no value specified for valid_max for column schools_serviced
  | # _do_auth(): user: mysql
  | # _do_auth(): self._auth_plugin: 
  | # _do_auth(): user: mysql
  | # _do_auth(): password: mysql
  | new_auth_plugin: caching_sha2_password
  | No Soda Cloud account configured
  | Executing SQL query: 
SELECT column_name, data_type, is_nullable 
FROM information_schema.columns 
WHERE lower(table_name) = 'breakdowns' 
  AND table_schema = 'new_york'
  | SQL took 0:00:00.007393
  |   boro (text) 
  |   breakdown_or_running_late (text) 
  |   bus_company_name (text) 
  |   bus_no (text) 
  |   busbreakdown_id (int) 
  |   created_on (text) 
  |   has_contractor_notified_parents (text) 
  |   has_contractor_notified_schools (text) 
  |   have_you_alerted_opt (text) 
  |   how_long_delayed (text) 
  |   incident_number (int) 
  |   informed_on (text) 
  |   last_updated_on (text) 
  |   number_of_students_on_the_bus (int) 
  |   occured_on (text) 
  |   reason (text) 
  |   route_number (text) 
  |   run_type (text) 
  |   school_age_or_prek (text) 
  |   school_year (text) 
  |   schools_serviced (text) 
  | 21 columns:
  | Query measurement: schema = [{'name': 'boro', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'breakdown_or_running_late', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'bus_company_name', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'bus_no', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'number', 'semanticType': 'number'}, {'name': 'busbreakdown_id', 'type': 'int', 'dataType': 'int', 'nullable': True, 'logicalType': 'number', 'semanticType': 'number'}, {'name': 'created_on', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'has_contractor_notified_parents', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'has_contractor_notified_schools', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'have_you_alerted_opt', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'how_long_delayed', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'incident_number', 'type': 'int', 'dataType': 'int', 'nullable': True, 'logicalType': 'number', 'semanticType': 'number'}, {'name': 'informed_on', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'last_updated_on', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'number_of_students_on_the_bus', 'type': 'int', 'dataType': 'int', 'nullable': True, 'logicalType': 'number', 'semanticType': 'number'}, {'name': 'occured_on', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'reason', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'route_number', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'run_type', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'school_age_or_prek', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'school_year', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'time', 'semanticType': 'time'}, {'name': 'schools_serviced', 'type': 'text', 'dataType': 'text', 'nullable': True, 'logicalType': 'number', 'semanticType': 'number'}]
  | Exception during aggregation query
Traceback (most recent call last):
  File "/Users/mkoloev/Desktop/soda_projects/soda_mysql_2/venv/lib/python3.9/site-packages/sodasql/scan/scan.py", line 254, in _query_aggregations
    fields.append(dialect.sql_expr_count_conditional(scan_column.non_missing_condition, column_name))
TypeError: sql_expr_count_conditional() takes 2 positional arguments but 3 were given
  | Test error for "row_count > 0": name 'row_count' is not defined
  | Test error for "invalid_percentage <= 20": name 'invalid_percentage' is not defined
  | Test error for "invalid_percentage == 0": name 'invalid_percentage' is not defined
  | Test error for "missing_count == 0": name 'missing_count' is not defined
  | Test error for "invalid_percentage == 0": name 'invalid_percentage' is not defined
  | Test error for "invalid_percentage <= 15": name 'invalid_percentage' is not defined
  | Executed 1 queries in 0:00:00.021183
  | Scan summary ------
  | 1 measurements computed
  | 6 tests executed
  | 6 of 6 tests failed:
  |   Test test(row_count > 0) failed with measurements null
  |   Test column(bus_no) test(invalid_percentage <= 20) failed with measurements null
  |   Test column(incident_number) test(invalid_percentage == 0) failed with measurements null
  |   Test column(incident_number) test(missing_count == 0) failed with measurements null
  |   Test column(school_year) test(invalid_percentage == 0) failed with measurements null
  |   Test column(schools_serviced) test(invalid_percentage <= 15) failed with measurements null
  | Errors occurred!
  |   [error] Exception during aggregation query
  |   [test_execution_error] Test "row_count > 0" failed
  |   [test_execution_error] Test "invalid_percentage <= 20" failed
  |   [test_execution_error] Test "invalid_percentage == 0" failed
  |   [test_execution_error] Test "missing_count == 0" failed
  |   [test_execution_error] Test "invalid_percentage == 0" failed
  |   [test_execution_error] Test "invalid_percentage <= 15" failed
  | Exiting with code 1
  | Starting new HTTPS connection (1): collect.soda.io:443
  | https://collect.soda.io:443 "POST /v1/traces HTTP/1.1" 200 0

Context warehouse.yml:

name: mysql
connection:
  type: mysql
  host: localhost
  port: 3306
  username: mysql
  password: mysql
  database: new_york

test: breakdowns.yml:

table_name: breakdowns
metrics:
  - row_count
  - missing_count
  - missing_percentage
  - values_count
  - values_percentage
  - invalid_count
  - invalid_percentage
  - valid_count
  - valid_percentage
  - avg_length
  - max_length
  - min_length
  - avg
  - sum
  - max
  - min
  - stddev
  - variance
tests:
  - row_count > 0
columns:
  incident_number:
    valid_format: number_whole
    tests:
      - invalid_percentage == 0
      - missing_count == 0
  school_year:
    valid_format: date_inverse
    tests:
      - invalid_percentage == 0
  bus_no:
    valid_format: number_whole
    tests:
      - invalid_percentage <= 20
  schools_serviced:
    valid_format: number_whole
    tests:
      - invalid_percentage <= 15

OS: macOS Monterey 12.2.1 Python Version: 3.9.5 Soda SQL Version: 2.2.1 Warehouse Type: mysql

However it works fine with postgres

vijaykiran commented 2 years ago

Hi @mrkoloev Soda SQL is now currently deprecated, we recommend migrating to https://github.com/sodadata/soda-core , we just released MySQL support for it.