racket / racket-pkg-website

A frontend for the Racket Package Catalog.
Other
9 stars 15 forks source link

Redact or otherwise break apart email addresses #77

Closed williewillus closed 1 year ago

williewillus commented 3 years ago

I used a fresh email address to publish a few packages on the package server a few months ago such as https://pkgs.racket-lang.org/package/cbor

I've already started receiving spam on that address, which I suspect is probably due to the package server listing email addresses in plain text and getting indexed by crawlers.

The only other thing I've used that address for is to comment on the AUR, but the AUR user system does not expose the email address of members, so it's probably the Racket package server.

Not sure what the optimal solution here is, but just raising the issue.

benknoble commented 1 year ago

Seconding; received crypto spam to an email used for only the package catalog and Racket Discourse.

spdegabrielle commented 1 year ago

I’d propose removing email addresses outright as users can contact the package owner via the source link

e.g. https://pkgs.racket-lang.org/package/r16 -> https://git.sr.ht/~williewillus/r16

the flaw in this proposal is you lose the ability to search by author unless you already know the email they use: https://pkgd.racket-lang.org/pkgn/search?tags=author%3Ahandle%40domain.name

I’d suggest this is used rarely and the loss of the functionality would not be missed, and on balance protecting the privacy of contributors is more important.

spdegabrielle commented 1 year ago

I think changing this function would be sufficient https://github.com/racket/racket-pkg-website/blob/b63f84beb193ae2af5415b9e68d0cf1a90b8ac5c/src/site.rkt#L589

greghendershott commented 1 year ago

Same here: I also just started to receive spam, also for a distinct email address.

@spdegabrielle As a WAG 99% of the packages are hosted on "forge" sites like sourcehut, github, etc. In that case, authors would probably prefer the email addresses to be hidden, not only for spam, but also to funnel communication into the issues "inbox" on the forge site. (At least that's my preference.)

OTOH Racket also supports packages hosted on non-forge sites... where the only way to contact the author, probably, is via email. That's probably the justification for showing email addresses.

  1. Personally I'd be OK ignoring the 1% and just hiding the address for all packages.

  2. I could imagine something slightly more clever, like showing the address only when the source domain isn't on some forge site list like '("git.sr.ht" "github.com" "gitlab.com" ...).

  3. I could imagine doing both 1 soon to "quickly plug the leak", then later looping back to do 2 or other refinement.

benknoble commented 1 year ago

Re: (2), note that self-hosting is possible with many such sites.

An alternative is to use the semi-standard "foo [at] org [dot] com" to display emails (or some similarly obfuscated version: IIRC, the catalog already requires you to evaluate Racket code when creating an account, so "eval this form to get the email address" might be fun).

jryans commented 1 year ago

It's a bit unfortunate that currently package server accounts don't have a "username" concept like you see in most other accounts... If they did, then the username could be what is displayed and used for searching by author, while the email address is kept hidden.

greghendershott commented 1 year ago

Re: (2), note that self-hosting is possible with many such sites.

Good point.

An alternative is to use the semi-standard "foo [at] org [dot] com" to display emails (or some similarly obfuscated version: IIRC, the catalog already requires you to evaluate Racket code when creating an account, so "eval this form to get the email address" might be fun).

I think this is good to defeat simple scrapers. (Although it's such a popular technique, if I wrote a scraper, it would be the first enhancement I'd add. :smile:)

This doesn't really help people (like me) who would prefer that questions/issues go into a transparent "Issues" feature on a forge, as opposed to also getting private emails on the side.


Maybe a per-author "show email address?" boolean option would be better? Default off. People with non-forge packages could toggle on.

When false, presumably the package database has some other primary key (like an integer) that is the "real" user id, and which could be used in HTML links/searches in lieu of the email address?

jryans commented 1 year ago

Maybe a per-author "show email address?" boolean option would be better? Default off. People with non-forge packages could toggle on.

When false, presumably the package database has some other primary key (like an integer) that is the "real" user id, and which could be used in HTML links/searches in lieu of the email address?

I might be missing something, but currently it appears the "primary key" at the moment effectively is the email address. User info is stored in files with the email address as the file name. The only things known about a user are the email address and a hash of their password.

So it feels like a "proper" fix (without losing features such as author search) would need to introduce some additional concept for the primary key, such as an integer or a username.

As others have mentioned, a quick fix could hide / obfuscate the email address and remove author search to attempt to stop the spam quickly.

greghendershott commented 1 year ago

I might be missing something, but currently it appears the "primary key" at the moment effectively is the email address. User info is stored in files with the email address as the file name. The only things known about a user are the email address and a hash of their password.

Ah, OK. The authors aren't in a SQL table; instead they seem to be files managed by a intrastructure-userdb package.

LiberalArtist commented 1 year ago

How about adding a display-name field to infrastructure-userdb with a contract of (or/c #f (and/c #px"[^[:blank:]]" (not/c #px"@"))), where a string means "show this instead of my email address"?

(We wouldn't even need to worry about collisions, necessarily, though we might want to disambiguate them in the UI.)

Personally, my philip@philipmcgrath.com email address is published everywhere from OpenPGP keyservers to my package documentation. (I gave up on passing #:obfuscate? #t to author+email a few years ago.) I have to fight the spam anyway, and I at least want people to know how to reach me.

That said, I do agree that users should be able to choose not to display their email address here.

jryans commented 1 year ago

@williewillus worked on https://github.com/racket/racket-pkg-website/pull/86 to hide the author emails, and this has now been deployed. The server is working on re-rendering each package page, but you can see from the ones that have updated (e.g. https://pkgs.racket-lang.org/package/binutils) that it looks correct.

I believe that covers the main concern here, so I'll close this issue. I filed https://github.com/racket/racket-pkg-website/issues/87 to track the longer-term goal of re-enabling author display in some other way that keeps emails private.