tokern / piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
https://tokern.io/piicatcher/
Apache License 2.0
280 stars 96 forks source link

Switches for views, external schemas, remote DBs, etc. #214

Open mateuszboryn opened 1 year ago

mateuszboryn commented 1 year ago

It would be good to have switches in command line options (and in API of course):

nicolepng commented 1 year ago

Hi @mateuszboryn piicatcher currently utilises the amundson package to retrieve data from the databases and the sql queries are on a tabular level [https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/postgres_metadata_extractor.py]. Hence, we are unable to filter out the database views and create a switch for that.

vrajat commented 1 year ago

Another option is to use include/exclude lists: https://docs.tokern.io/piicatcher/include_exclude_lists