mantiumai / chirps

Discover sensitive/confidential information stored in a vector database
GNU General Public License v3.0
57 stars 7 forks source link

Enhancement of Vector Database Scanner with NER for Improved PII Detection #208

Open JustEmrick opened 10 months ago

JustEmrick commented 10 months ago

Description:

The current implementation of our PII policies relies predominantly on regular expressions (Regex). While this method has served us so far, it could have the tendency to yield a notable number of false positives and misses. To bolster the accuracy and robustness of our scanner, we propose the integration of Named Entity Recognition (NER) techniques along with other relevant methods to detect PII.

Background:

Proposed Solution: