Closed notslang closed 8 years ago
What prevents someone from providing a "SHA -> email" table for every email on GitHub?
You're right, if someone is willing to compile and provide a "rainbow table" on the side, we can't stop them. However, that by itself is not an overriding argument against obfuscating said data.
I understand and sympathetic to all of your points. In fact, I've raised all the same questions and points in the past. However, this is a sensitive area with wildly different opinions (e.g. some of the current and past discussions on https://github.com/ghtorrent/ghtorrent.org) and we have to find a balance that is acceptable to all sides. After talking about these issues with the GitHub folks, we arrived at the current strategy -- you may disagree with it, I understand that.
Closing, feel free to reopen if needed.
As of fd53d3a80fd07289581541cc99446d2dce36c770, the email field is dropped entirely, and more recently it's obfuscated with SHA1. Since you've decided to preemptively block conversation on c9ae11426e5bcc30fe15617d009dfc602697ecde, I guess I'll reply here...
What prevents someone from providing a "SHA -> email" table for every email on GitHub? Unless there's some implementation detail I'm missing, this sounds like a case of security through obscurity that is only going to make this dataset harder to use, while not preventing spam. Furthermore, email is a public identifier - it's written on every single commit and is designed to let people find and contact you. It's not a secret that's meant to be hidden.
Unless GitHub is willing to break git entirely, people will just ignore GitHub's API & read the email from the patch generated from every commit: