sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
59 stars 16 forks source link

Notifying when untracted tables are found #152

Open JCZuurmond opened 2 years ago

JCZuurmond commented 2 years ago

Is your feature request related to a problem? Please describe. I would like to get a notification if there some tables are not tracked by soda.

Describe the solution you'd like A clear and concise description of what you want to happen.

What I am thinking off roughly:

  1. List all tables from the warehouse
  2. Compare this with the tables defined in the tables/ folder
  3. List which tables are in the warehouse and are not in the tables/ folder.

Optionally with an ignore_list if someone would like to exclude certain tables.

Additional context When we restructure our data pipelines or change names of the tables, it happens quite often that certain tables become stale. I would like to detect these somehow.

lucas-houles commented 2 years ago

Hi @JCZuurmond @vijaykiran, I thought about this issue actually and for my understanding there is no need to add this feature because today the soda scan command can handle only one table at the same time. It will make the execution of the command more cumbersome if at each run we need to make a diff between the tables in the warehouse and those that are defined in tables folder (imagine the case with a huge warehouse). Moreover, with the analyze command we can already detect this behavior if for example a table is renamed, the new table will appear in the tables folder and the user will have to delete the old one manually. One idea could be to add a new command in order to tackle it but i don't think adding this behavior to the scan command is a good thing.

JCZuurmond commented 2 years ago

Ok, clear, let's close this issue in that case if @vijaykiran agrees too