Open sgrif opened 5 years ago
This is intentional, and why we have gh_id
Except we don't link to users by gh_id, we do so by gh_login. This leads to cases where we display that a user is an owner of a crate, when they do not in fact own that crate. Clicking on the link takes you to a completely different user, which at best is confusing, and at worst is a vector for social engineering attacks.
To be clear, I think any solution to this requires marking that the user is no longer actually the owner of that gh_login somehow (e.g. the implementation of #1586), which makes it unaffected by #1109. The index we'd add would be UNIQUE ON gh_login WHERE NOT gh_deleted
, since we can have infinitely many deleted accounts for a single username (we have one username that has apparently deleted and recreated its github account 26 times).
But we do need some way to continue to identify that user after they've deleted their GH account. In #1586 I've proposed using deleted_{id}
as it seems like the simplest solution, but we could also do something like deleted_{gh_login}_{n}
where n
is SELECT ROW_NUMBER() OVER (ORDER BY id) FROM users WHERE gh_login = $1
if we want to include the old login or avoid coupling to the ID.
This issue is basically just an extension to #1585, and if #1586 is merged, the solution to this issue is just running UPDATE users SET gh_deleted = TRUE WHERE gh_login = $1 AND gh_id != $2
whenever a user is created or changes its gh_login
(this can be done in NewUser::create_or_update
or as a trigger on users
). If they renamed their account, rather than deleting it, it'll get marked as undeleted the next time they log in (when we learn what their new username is). We pretty much have to assume it's deleted until then since we have no way to tell if they deleted it or renamed it until they log back in.
For the record, this and #1109 were just mentioned as relevant to a /r/rust/ post named Fiasco with compromised Python packages may extend to Rust ecosystem.
(The TL;DR is that the person demonstrated an attack on PyPI and Composer and a bypass for GitHub's protection against reusing retired repository names and claims it's equally applicable to NPM and Crates.io but didn't actually demonstrate that.)
If anyone is able to demonstrate publishing a new version to an existing crate by claiming an abandoned username on GitHub, please alert security@rust-lang.org. As far as I know, this is not possible because we check the GitHub ID number, which changes every time a GitHub username is abandoned and re-registered.
Another possible implementation is to start requiring the crates.io account ID in every user URL, like /u/4825/pietroalbini
. That way there would still be independent account pages for users with the same GitHub usernames, and in case of renames we'd be able to just redirect to the right username (as we know the right user with the account ID).
I'm not sure requiring usernames to be unique is workable, especially if in the future we'll start supporting other OAuth providers.
Right now we do not have a unique index on gh_login. Multiple users can have the same value for that column if the following sequence of events occurs:
We currently have 23 duplicate gh_login records across 72 user records. We do have cases where both users are owners of separate crates. We will need to manually figure out on a case by case basis for the existing records. We should prevent this from occurring in the future by checking to see if an existing user has the same gh_login, and changing that record if it exists.
The question is what do we change it to? We currently use gh_login as the identifier for the /users route, so we can't just blank it out and call it a day. We need some way to continue to link to that user.