nationalparkservice / DPchecker

DPchecker (Data Package checker) is a package with a series of functions for NPS data package authors and reviewers to check for internal consistency among data/meta data and with the data package standards
https://nationalparkservice.github.io/DPchecker/
Other
4 stars 1 forks source link

Detect non-government emails (PII) in metadata or data #101

Closed juddpatterson closed 1 year ago

juddpatterson commented 1 year ago

Describe the solution you'd like I wonder if it's worth a rudimentary sensitive PII scanner, specifically to warn folks if they include non-government email addresses. Detecting .edu, .com, etc. emails may be challenging, but a pretty simpler implementation could look for gmail.com, yahoo.com, hotmail.com, and perhaps a few other common email domains? https://www.thewindowsclub.com/commonly-used-email-addresses

Additional context This is definitely not a high priority, but could catch data that I suspect is occasionally (accidentally) shared.

RobLBaker commented 1 year ago

I don't think it should be too hard to look for emails and filter out .govs. This is a great idea!