Open denisbnet opened 1 year ago
Hi @denisbnet! :)
For the DatumSpacyDetector
, we are currently utilizing commonregex-improved
python library to carry out regex matching to detect the pii types. To fix the problems raised for DatumSpacyDetector
, we can look into generating different regex expressions or looking at utilizing a different method to increase accuracy for DatumSpacyDetector. Feel free to open PRs or suggestions in doing so :)
For ColumnNameRegexDetector
, it is possible to update the regex for column matching by changing the regex in scanner.py
. If you would like to create a new detector, you can do so by referring to the documentation in detectors.py
DatumSpacyDetector:
spaCy version 3.5.2 Platform Linux-5.15.0-70-generic-x86_64-with-glibc2.35 Python version 3.10.6 Pipelines en_core_web_md (3.5.0), en_core_web_sm (3.5.0)
ColumnNameRegexDetector: