sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
59 stars 16 forks source link

[Feature] Add a verbose option for `scan` #149

Open HansBambel opened 2 years ago

HansBambel commented 2 years ago

Is your feature request related to a problem? Please describe. Having a descriptive output about what is happening is useful, but it would be nice to have an option that reduces the output to the major findings.

Describe the solution you'd like An option such as -verbose for the soda scan command would be nice to have. Running the command with for example verbose -1 would result in only output whether the tests passed or whether they failed and if so, which ones and why.

So basically removing:

Additional context One could assume that the user knows that when using this command he/she knows the information is limited.

Soda SQL Version:

vijaykiran commented 2 years ago

Thank you @HansBambel I think currently logs are too chatty :) Do you have any thoughts on verbose levels and what it should print ?

HansBambel commented 2 years ago

I'd propose something along those lines:

verbose -1: Only output failed tests:

  | Scanning tables/scan.yml ...
  | 7 tests executed
  | 1 of 7 tests failed:
  |   Test column(my_column) test(2500 < avg) failed with measurements 2100.20
  | Errors occurred!
  |   [test_execution_error] Test "2500 < avg" failed
  | Exiting with code 1

verbose 0 (default): Passed and failed tests:

  | Scanning tables/report_damages/report.yml ...
  | No Soda Cloud account configured
  | Test sqlmetric(0) test(my_column1 > 0.0) passed with measurements {"expression_result": 0.141612164498951, "my_column1": 0.141612164498951}
  | Test test(row_count > 0) passed with measurements {"expression_result": 727494, "row_count": 727494}
  | Test column(my_column2) test(avg > 1000) passed with measurements {"expression_result": 1294.54153513079, "avg": 1294.54153513079}
  | Test column(my_column2) test(avg < 1600) passed with measurements {"expression_result": 1294.54153513079, "avg": 1294.54153513079}
  | Test column(my_column3) test(avg > 400) passed with measurements {"expression_result": 505.662826908393, "avg": 505.662826908393}
  | Test column(my_column3) test(avg < 400) failed with measurements {"expression_result": 505.662826908393, "avg": 505.662826908393}
  | Test column(my_column4) test(avg > 2500) passed with measurements {"expression_result": 2872.11595225701, "avg": 2872.11595225701}
  | Test column(my_column4) test(avg < 3100) passed with measurements {"expression_result": 2872.11595225701, "avg": 2872.11595225701}
  | 8 tests executed
  | 1 of 8 tests failed:
  |   Test column(my_column3) test(avg < 400) failed with measurements {"expression_result": 505.662826908393, "avg": 505.662826908393}
  | Exiting with code 1

verbose 1: Include the Query measurements and Derived measurements

  | Query measurement: row_count = 123456
  | Query measurement: values_count(id) = 123456
  | Query measurement: values_count(my_column1) = 122345
  | Query measurement: values_count(my_column2) = 87654
  | Query measurement: values_count(my_column3) = 87654
  | Query measurement: values_count(my_column4) = 87654
  | Derived measurement: missing_percentage(id) = 0.0
  | Derived measurement: missing_count(id) = 0
  | Derived measurement: values_percentage(id) = 100.0
  | Derived measurement: missing_percentage(my_column1) = 55.45
  | Derived measurement: missing_count(my_column1) = 55000
...
  | Test sqlmetric(0) test(my_column1 > 0.0) passed with measurements {"expression_result": 0.141612164498951, "my_column1": 0.141612164498951}
  | Test test(row_count > 0) passed with measurements {"expression_result": 727494, "row_count": 727494}
  | Test column(my_column2) test(avg > 1000) passed with measurements {"expression_result": 1294.54153513079, "avg": 1294.54153513079}
  | Test column(my_column2) test(avg < 1600) passed with measurements {"expression_result": 1294.54153513079, "avg": 1294.54153513079}
  | Test column(my_column3) test(avg > 400) passed with measurements {"expression_result": 505.662826908393, "avg": 505.662826908393}
  | Test column(my_column3) test(avg < 400) failed with measurements {"expression_result": 505.662826908393, "avg": 505.662826908393}
  | Test column(my_column4) test(avg > 2500) passed with measurements {"expression_result": 2872.11595225701, "avg": 2872.11595225701}
  | Test column(my_column4) test(avg < 3100) passed with measurements {"expression_result": 2872.11595225701, "avg": 2872.11595225701}
  | 8 tests executed
  | 1 of 8 tests failed:
  |   Test column(my_column3) test(avg < 400) failed with measurements {"expression_result": 505.662826908393, "avg": 505.662826908393}
  | Exiting with code 1

And finally verbose 2: including the SQL statements as well.