Closed ronaldtse closed 2 years ago
Remaining: mailreplay, postconfirm, postfind, dmarc, emailcore, tsvwg. (Updated in OP)
@rjsparks could you help confirm the following?
ietfdb
repository, all Trac tickets that relate to components other than MailArchive: *
should be migrated there (see screenshot).(https://trac.ietf.org/trac/ietfdb/query)
xml2rfc-website
and xml2rfc-bibxml
, how do we filter out relevant tickets via components (see screenshot)? Or perhaps no tickets should be migrated into these split off repositories?I have added a new GitHub organization called ietf-svn-conversion
(https://github.com/ietf-svn-conversion) for the testing of migrating issues. All users have full admin rights there so that you can delete/re-create the repos there when a re-run is necessary.
@ronaldtse What should the filters be in case of xml2rfc-website
and xml2rfc-bibxml
?
@HassanAkbar For both of these repos, we do not need to import tickets/issues. Thanks!
@ronaldtse Where can I get the email
to git username
mapping for all these trac instances?
Do we have a separate repository for that?
The Trac/SVN username to GitHub email mappings are provided in this repo, under this pattern: {reponame}/{reponame}.map
.
The Trac/SVN username to GitHub email mappings are provided in this repo, under this pattern:
{reponame}/{reponame}.map
.
@ronaldtse we need to provide Github username
to Github API
.
The API does not support taking Github Email
as an input.
Ah, then we need to do that manually. Maybe we should just define a YAML mapping?
I've provided a structure here: https://github.com/ietf-ribose/svn-github-convert/blob/main/github-users-email.yaml .
We have to do this manually because some emails are not "public GitHub emails" that cannot be searched on GitHub, but they show up in Git commit history.
I've made a new issue for this: https://github.com/ietf-ribose/svn-github-convert/issues/35 . Tractive needs to utilize this new mapping format for GitHub email => username.
I've made a new issue for this: #35 . Tractive needs to utilize this new mapping format for GitHub email => username.
Sure thing. I will take over this after getting the Github workflow finalized.
I've made a new issue for this: #35 . Tractive needs to utilize this new mapping format for GitHub email => username.
Just to clarify, are we changing the format of user mapping provided to tractive in the config.yaml
file from
adam@nostrum.com: adamroach
to
mapping:
- name: Adam Roach
email: adam@nostrum.com
username: adamroach
@ronaldtse I just wanted to know if there is a special reason for that because hash will give us O(1) for retrieval while the new format (of the array) will take O(N) for retrieval.
Yes, the reason is we want to keep the name, email and username together for information purposes.
There should be no performance difference here because we should read the entire file at once in the beginning, then we can convert this into a hash internally. A big-O difference only happens if we’re using this structure to search, but we’re not (I.e. file data representation != data representation in Tractive).
@ronaldtse I encountered a few issues while creating trac issues on Github
collaborator
in repo or has added a comment
on that issue, (https://github.blog/2019-06-25-assign-issues-to-issue-commenters/) tom111.taylor
is listed in a commit but there is no GitHub user with that username. Adding an image for reference:This is because we can add any username
while making a commit to Github but this can't be done when creating issues. We can only add valid GitHub usernames of the users that are either collaborators or have commented on that issue.
It feels like reposurgeon
is creating usernames for commits by stripping everything in the email address after the @
symbol.
So for the above case tom111.taylor@bell.net
reposurgeon
will make a commit from tom111.taylor
username.
I think we will have to add these users as collaborators to our final repository if we want to assign issues to them.
P.S. I have tested this workflow unless a user accepts the invitation as Github collaborator
he can't be added as an assignee.
What do you suggest we should do here?
@HassanAkbar there is a misunderstanding:
It feels like reposurgeon is creating usernames for commits by stripping everything in the email address after the @ symbol.
Perhaps the list in the screenshot is outdated? In the current repo, we no longer have the reposurgeon
names. We have changed them all to real names:
https://github.com/ietf-ribose/svn-github-convert/search?q=tom111
We obtained all these emails from an IETF internal system, the emails are correct. However, the emails are not necessarily mapped to a GitHub-registered email, and that user may not have a GitHub account.
This is the list of emails not yet mapped to a GitHub account:
So in order to make the mapping happen we really need to finalise the SVN user to GitHub email/username mapping.
@ronaldtse
Thanks for the clarification. You are right that the emails have been updated in the file. I need to finalize the SVN user -> GitHub email/username mapping. I am working on this.
I just wanted to highlight this 2nd issue:
We can not assign an issue to someone unless he is a collaborator in repo or has added a comment on that issue, (https://github.blog/2019-06-25-assign-issues-to-issue-commenters/)
I think we will have to add these users as collaborators to our final repository if we want to assign issues to them. I have tested this workflow unless a user accepts the invitation as Github collaborator
he can't be added as an assignee.
Thanks.
Indeed, I forgot about that. Perhaps we cannot migrate the assignments until all these users accept the invitation.
@rjsparks do we need to migrate the ticket assignments? If so, could we ask all users to accept the GitHub invitation in the final repository for migration?
No - it's fine (maybe a feature) if ticket assignment gets lost. @rpcross, @jennifer-richards, @kesara: heads up...
@rjsparks so just to confirm that we "won't migrate ticket assignments" but keep them in the issue comments as text.
By any chance @rpcross is the elusive Ryan Cross we've trying to pin down? Would he be able to add his amsl.com
email to his GitHub account to enable linking?
@ronaldtse As we will be creating all issues from one Github account.
We will be needing the Github Personal Access Token
of the user we want to use for final migration.
@ronaldtse Do we need 2 separate repos for ietfdb (Mailarchive)
and ietfdb (Datatracker)
?
Yes, the mailarchive should end up in its own repository separate from the datatracker.
There are total 3389
tickets in total in ietfdb
.
For MailArchive
, there are only 327 tickets
and for Datatracker
there are only 205 tickets
.
Do we need to migrate to the remaining tickets in a separate repository ?
Thanks for checking. I don’t remember what the other components are other than Datatracker. Do you have a list?
On https://trac.ietf.org/trac/ietfdb/query there are many components seemingly not related to datatracker, eg. Projects, noncom, drafts etc.
@rjsparks to preserve history I suspect we may want to store those tickets somewhere. Where should those go?
Thanks for checking. I don’t remember what the other components are other than Datatracker. Do you have a list?
@ronaldtse Here is the list of all the components.
P.S there are some tickets with the Empty
or null
component in database.
@HassanAkbar “MailArchive: *” tickets will go into the “mailarchive” repo.
Will let @rjsparks answer the question about where the remaining tickets should go.
I ran a test migration and the issue are now migrated to the following repos:
All of the tickets in ietfb that have a component that doesn't start with MailArchive belong to the datatracker (even the ones that have no declared component). The long list above (minus the two MailArchive components) are parts of the datatracker, or were datatracker-oriented projects.
@rjsparks got it. Then can we confirm that there should only be two repositories as target outputs for ietfdb
: mailarchive
and datatracker
? (i.e. the ietfdb
name will go away)?
that is correct
Thanks @rjsparks . @HassanAkbar can you help update the config so we are exporting to those two repositories without the ietfdb-
prefix? Thanks.
@ronaldtse In tsvwg
database there is an email draft-ietf-tsvwg-source-quench@tools.ietf.org
which when combined to form a label like owner:draft-ietf-tsvwg-source-quench@tools.ietf.org
exceeds the maximum length allowed for label creation which is 50
characters.
What should we do in this case?
Right now I have changed the email from draft-ietf-tsvwg-source-quench@tools.ietf.org to draft-ietf-tsvwgsource-quench@tools.ietf.org to bypass the label length limitation.
@ronaldtse Ran another round of test migrations here are the results:
@ronaldtse Also we should finalize the label colors
before final migration to make sure they are the same across all the repositories.
Any suggestions are welcome for this.
@HassanAkbar The label colors look good now!
The current "owner" label is a bit awkward looking. But since we cannot expect everyone to (a) have a GitHub account; (b) be added to the repos; there is no better alternative.
On the other hand, when we migrate tickets, it is possible to convert the emails into GitHub usernames, given that we have the mapping (if there is no mapping we just show the email):
@rjsparks could we ask you for feedback on the migrated tickets (see above comment), and let us know if in the comments we should also tag the GitHub user?
In the final migration, will "opened on date
by Hassan Akbar" be replaced by "opened on date
by Real Author" when the users github account is actually known?
If really hope so, but if not, how am I supposed to find all of the tickets opened by a given person?
If so, then we wouldn't need to tag Jay above - he'd just be the user that opened the ticket, and for those users don't have github accounts that we know about having it be by
What's the current plan for when we will be able to check that references to subversion commits in these tickets are mapped correctly to github commits? (And that commit messages in svn that contain references to trac tickets end up with git commit messages that reference the right issue?).
The [19412] in the last comment at https://github.com/ietf-svn-conversion/datatracker/issues/3424 will eventually need to be a link to some github commit.
Tell me more about the solution for component - I think it's just a bit of text marked as code
in the first comment?
For us to continue to use the components to manage the project, that adds a bit of arcana that anyone working with the system would have to remember to add? Or do I misunderstand what's happening with that?
In the final migration, will "opened on date by Hassan Akbar" be replaced by "opened on date by Real Author" when the users github account is actually known? If really hope so, but if not, how am I supposed to find all of the tickets opened by a given person? If so, then we wouldn't need to tag Jay above - he'd just be the user that opened the ticket, and for those users don't have github accounts that we know about having it be by is fine.
Unfortunately, the opened by
will not be the real author but instead will be the account of the user that is used to perform the migration. This is a restriction by Github API.
In the above case, I am using my account to perform migrations so that is why it says opened on date by hassan akbar
.
If the username is provided in the config file, a label having format owner:<github username>
is added that can be used to search for tickets opened by a given person.
What's the current plan for when we will be able to check that references to subversion commits in these tickets are mapped correctly to github commits? (And that commit messages in svn that contain references to trac tickets end up with git commit messages that reference the right issue?). The [19412] in the last comment at ietf-svn-conversion/datatracker#3424 will eventually need to be a link to some github commit.
A PR is opened to fix this.
I will leave the last question for @ronaldtse to answer.
Tell me more about the solution for component - I think it's just a bit of text marked as code in the first comment?
Right now we map "component" to labels.
For us to continue to use the components to manage the project, that adds a bit of arcana that anyone working with the system would have to remember to add? Or do I misunderstand what's happening with that?
You are correct. The difference is that on GitHub the component label is not mandatory for creating an issue.
To aid this we could potentially name the component labels as "component:foobar" for a component "foobar" to make them clear. Would that help?
In the final migration, will "opened on date by Hassan Akbar" be replaced by "opened on date by Real Author" when the users github account is actually known?
This is an unfortunate fact that accompanies the migration -- both GitHub Issues APIs (bulk and v3) do not permit creating issues on behalf of other users. The security implications are clear why they wouldn't want that. At least we are able to set the original "date"!
That's why tagging the user's GitHub handle is important in the migrated issue.
So then the final migration should be done by an identity that makes it very clear what happened. An account created for just this purpose with a name of 'ietfsvnmigration' or something that conveys the message but is shorter.
This is really sad in that it will not reflect all the contribution (in terms of issues) on user's activity graphs.
See @larseggert for example:
I'll need to explain that carefully and fully to the community.
mapping the components to labels "component:foobar" would be better - it would make it more intuitive for people reporting to do the right thing.
This is really sad in that it will not reflect all the contribution (in terms of issues) on user's activity graphs.
Indeed. At least the commits will show on their contribution log. This is a known caveat with migrating to GitHub, for now.
mapping the components to labels "component:foobar" would be better - it would make it more intuitive for people reporting to do the right thing.
Got it. @HassanAkbar can you help make the corresponding change? Thanks.
mapping the components to labels "component:foobar" would be better - it would make it more intuitive for people reporting to do the right thing.
Got it. @HassanAkbar can you help make the corresponding change? Thanks.
Sure thing.
@ronaldtse
SVN revision
to GIT SHA
in the commit messages, we need to create new commits because commits are immutable in Git. We can only Remove & REDO
to change an existing commit message.REMOVE & REDO
will have new SHA
, so we will have to update the revmap
file after updating each commit message.reposurgeon
because it creates a fast-import stream for git to create the repo. Before importing that file to Github, we don't have the SHA
hashes. So, we can only edit commit messages after the import of Github repo.Let me know if you have any other thoughts here.
UPDATE:
Found an example in the documentation of git-filter-repo
:
git-filter-repo --message-callback '
if b"Signed-off-by:" not in message:
message += b"\nSigned-off-by: Me My <self@and.eye>"
return re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)'
We can do something like this and read the SHA hashes from revmap file. But this works only with python. Should we go for this approach?
@HassanAkbar thank you for researching the approach.
Can you help check step 1? Thanks!
xml2rfc
v3 vocabulary
xml2rfc-website
xml2rfc-bibxml
vocabulary_design_team_2013_2017
v3 vocabulary
The Trac tickets for the "vocabulary_design_team_2013_2017" repo can be seen here https://trac.ietf.org/trac/xml2rfc/query
(originally from https://github.com/ietf-ribose/svn-github-convert/issues/7#issuecomment-926992804)
ietfdb (Datatracker)
Datatracker: *
(see screenshot)ietfdb (Mailarchive)
MailArchive: *
(see screenshot)Mailreplay
No Trac, no need to migrate issues.
Postconfirm
No Trac, no need to migrate issues.
Postfind
No Trac, no need to migrate issues.
dmarc
Trac: trac-svn-db/trac/dmarc SVN: none.
emailcore
Trac: trac-svn-db/trac/emailcore SVN: none.
tsvwg
Trac: trac-svn-db/trac/tsvwg SVN: none.