psf / gh-migration

This repo is used to manage the migration from bugs.python.org to GitHub.
42 stars 8 forks source link

Map bpo users to GitHub users #11

Closed ezio-melotti closed 2 years ago

ezio-melotti commented 2 years ago

On bpo users can specify their GitHub username. If they do so, their bpo issues/comments can be mapped to their GitHub users, however this only works for users that belong to the "python" organization.

For users with a GitHub username that don't belong to the "python" org and for users that haven't specified their GitHub usernames, a placeholder user (called mannequin) with either their GitHub or bpo username will be created.

The mannequin will only show the username, so:

Mannequins can be manually reclaimed after the import, but this might still be impossible if the users don't belong to the org. A possible workaround is to create a new org, add all the bpo users that have a GitHub username to that org (possibly without sending out notifications), perform the import there so that all the users get mapped, then copy all the issues to python/cpython and remove the new org. This might preserve the user mapping even if the users don't belong to the "python" org.

gvanrossum commented 2 years ago

Do you know (roughly) how many users are in the python org and how many are not?

ezio-melotti commented 2 years ago

There are ~210 users and ~70 external contributors under the "python" org on Github. This should include all the core devs and possibly some staff, triagers, and people working on other projects under the org. On bpo there are ~33k users.

In other words, only core devs will be mapped, and all the other contributors won't be mapped unless we find a solution (GitHub is aware of the issue and looking into it).

gvanrossum commented 2 years ago

Thanks for indicating the scale. Creating 33k dummy GitHub users (while surely a drop in the bucket of so many millions of users) seems suboptimal, so I hope GitHub finds a solution. (If they don't, I hope that the dummy users at least links or provides a copy of that user's public bpo metadata -- IIRC there's at least an optional "real name"? And how bad would it be if it linked to the stated GH username, even if we can't verify that?)

ezio-melotti commented 2 years ago

As far as I understand, it doesn't create an actual dummy GitHub user -- it just shows the username and a (mannequin) tag next to it. Clicking or hovering on the username doesn't open a popup with the user info nor opens the user page. I also tried to set the full name for the mannequin user while importing the data, but the name doesn't show up anywhere.

This is how it looks like: 20211003--01

gvanrossum commented 2 years ago

Good luck!

warsaw commented 2 years ago

On Oct 2, 2021, at 19:55, Ezio Melotti @.***> wrote:

In other words, only core devs will be mapped, and all the other contributors won't be mapped unless we find a solution (GitHub is aware of the issue and looking into it).

So GitHub considers this a bug and say a security-related feature?

ezio-melotti commented 2 years ago

I believe this is by design, and that the migration tool was designed with self-hosted GitHub instances in mind. In that situation, all the users should belong to the organization, and users that are not in the org are likely former employers that left the org and should therefore be replaced by mannequins.

GitHub is checking if there's a way to map external users too. I've also been thinking about potential security concerns, and the only thing I can think of is that a bpo user could create issues or write comments on bpo and set their GitHub username on bpo to the username of e.g. a core dev. After the migration it will appear like the core dev wrote them but they won't be able to post anymore as the core dev, and the core dev should be able to edit/delete the old message. This could also be mitigated by reviewing duplicated GitHub usernames on bpo (in some cases conflating two bpo accounts into one GitHub account might be useful).

gvanrossum commented 2 years ago

Yeah, I'm pretty sure I have two bpo accounts that I use interchangeably.

ezio-melotti commented 2 years ago

I discussed with the SC the approach suggested by GitHub, i.e. creating a dummy org, inviting people to join it, waiting 1/2-week, and then performing the import, but even though this should map the accounts properly, it sounds like it might not be very effective, so the idea has been abandoned.

A better approach might be:

  1. encourage people to add their GitHub username on bpo (by adding a banner on top of bpo, writing to python-dev, and possibly by mailing them directly)
  2. add to the migrated issues people that have their GitHub username set (either by directly mentioning them in their messages, or if possible, by adding them to the issue during the migration)

Currently there are ~33k total users, ~8k have a GitHub username set (I have to double-check this), and the remaining ~25k don't. With this approach:

I'm still waiting to hear back from GitHub to see if there are other options available.

gvanrossum commented 2 years ago

That sounds acceptable. Is it impossible to get a join on email? (Even if GitHub doesn't reveal the email, it must have it, and possibly that could be used to validate the username claimed on bpo, if the bpo email is the same.)

I think direct-mailing the affected users would be worth it.

FFY00 commented 2 years ago

FWIW, Github seems to have rolled out an update that enables this kind of data claims. I believe the API is not publicly documented as they are still iterating over the design, you should probably reach out to them (I believe you are already in contact, but I may be wrong :sweat_smile:).

github-data-claim

ezio-melotti commented 2 years ago

There is a way to claim contributions, but afaik it only works for members of the organization. Are you a member of the llvm org?

FFY00 commented 2 years ago

Ah, alright. Yes, I am.

asl commented 2 years ago

A possible workaround is to create a new org,

There is no need for a new org. Just create a new team and invite users there. Our (LLVM) experience shows that mannequin resolution is a manual process (there is no API for this). So, it's tedious click-enter-click-click process. You do not want this for 33k users :) It would be better to create such team in advance and start sending invitations. The invitation expires in 7 days, so some users will certainly miss it, but you can send them several times, say, within a month. This will streamline the process heavily.

ezio-melotti commented 2 years ago

Eventually we settled on what I listed here: https://github.com/psf/gh-migration/issues/11#issuecomment-946111360

Of the ~35k bpo users, about ~8.8k have a linked GitHub username. Their username will be listed in the body of the message, but it's not possible to automatically subscribe them. #12 propose a solution, and it should be possible to automatically mention them with an action. Of the 190 users with the iscommitter bit, 131 have a linked GitHub username and 59 don't. The 59 that don't will still be able to reclaim their mannequin after the migration.