Columns names and data are identified incorrectly pii

tokern / piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

Apache License 2.0

281 stars 96 forks source link

Hi @denisbnet! :)

For the DatumSpacyDetector, we are currently utilizing commonregex-improved python library to carry out regex matching to detect the pii types. To fix the problems raised for DatumSpacyDetector, we can look into generating different regex expressions or looking at utilizing a different method to increase accuracy for DatumSpacyDetector. Feel free to open PRs or suggestions in doing so :)

For ColumnNameRegexDetector, it is possible to update the regex for column matching by changing the regex in scanner.py. If you would like to create a new detector, you can do so by referring to the documentation in detectors.py

tokern / piicatcher

Columns names and data are identified incorrectly pii #216