redhuntlabs / Octopii

An AI-powered Personal Identifiable Information (PII) scanner.
https://redhuntlabs.com/blog/octopii-an-opensource-pii-scanner-for-images.html
Other
643 stars 54 forks source link

New PII-related regexes #20

Open umair9747 opened 1 year ago

umair9747 commented 1 year ago

Is your feature request related to a problem? Please describe. I believe we can have more regexes for PII scanning. This can help expand the coverage of the tool.

Describe the solution you'd like I discovered a website that has a good amount of regexes that I believe can be useful for Octopii: https://docs.trellix.com/bundle/data-loss-prevention-11.10.x-classification-definitions-reference-guide/page/GUID-66B1F12A-E267-4EEB-A9A5-A4398A6AF8CD.html

Additional context None

deniercounter commented 4 months ago

Unfortunately https://docs.trellix.com/bundle/data-loss-prevention-11.10.x-classification-definitions-reference-guide/page/GUID-27F151A3-9CCA-40FF-99A0-35EEA1846AC8.html for the Austrian UID is not correct as this number starts with ATU then may be a blank or followed suit by 8 numbers.

First lookup and false answer. What a pity.

umair9747 commented 4 months ago

Trellix is a third-party source which we have no control after. Rather it is just a resource which we can use for adding up definitions (after verifying them ofcourse) + I am not sure if we have the Austrian UID definition within definitions.json do we? 🤔

deniercounter commented 4 months ago

@umair9747 Oh yes ... I am fully aware that this is a 3rd party.

umair9747 commented 4 months ago

Thanks for highlighting the false-positive case however, It will surely help to be extra careful while adding up new definitions to the tool. Appreciate your comment 🙌 @deniercounter