scala / scala-dev

Scala 2 team issues. Not for user-facing bugs or directly actionable user-facing improvements. For build/test/infra and for longer-term planning and idea tracking. Our bug tracker is at https://github.com/scala/bug/issues
Apache License 2.0
130 stars 15 forks source link

Plan move from JIRA to GitHub issues. #267

Closed adriaanm closed 7 years ago

adriaanm commented 7 years ago

Why/what?

Who?

Help wanted! If you'd like to hack on this, scala/scabot already has a JIRA api, and we can look into providing a JIRA dump to facilitate hacking on an import.

When?

Ideally early next 2017, but depends on who's available to work on this.

Potential pitfalls

How?

Non-goals

dwijnand commented 7 years ago

AFAIK Akka engaged with GitHub when they did their big bug import.

If you're interested in that (and not already aware) you might want to ask @ktoso about what they can do with their internal APIs.

ktoso commented 7 years ago

Hi there, indeed we talked with github support when we moved. What they can do is "bulk import" if you can give them the data as a csv or json I think - better ask before you prepare the data about the format. Tags you'll need to apply yourself, I think they can mark all imported ones with a tag if you'd ask them to.

Be aware that they can't "move issues between github projects", this we did semi-automatically (me calling a script many times over 2 days) recently as well, in case you'd need that let me know so I can show you status quo (though there's also a hidden API, but we did not end up using it - that API you could use yourself as well to do the bulk import yourself, would require coding against that API (RESTful-ish)).

Note that the bulk imports are much better for the community, because they won't trigger emails to people who are watching the repository.

adamvoss commented 7 years ago

I am trying to help out with this. There are a few places I have included a "straw man proposal." I do not feel it is really my place to say how these things should be done; however, I felt comming back with a lot of information and open-ended questions was perhaps not the most helpful I could be. Please take them as only as a "straw man proposal." While my biases, opinions, and guesses are present I include them only to try to advance the discussion and hopefully save work and not as how I trying to push it to be done.

Steps for implementation (high-level, unordered)

Discussions/Decisions

I have split my write-up on the above points into multiple comments to make it easier to link specific topics/issues if needed. I can edit this comment as things are decided or new things are identified.

adamvoss commented 7 years ago

Combatting spam

Spam was raised as a potential concern. There is no way to eliminate this risk, but some actions can be taken to try to mitigate this and encourage users to utilize Issues as intended, see tools below.

Tools

Community Places

An aspect of limiting "spam" or other unwanted GitHub issues that I think important is to give direction on where to go for various purposes. The scala-lang community page could probably be updated to reflect this as well (IMO, the page needs an overhaul as is). The depending on the decisions made, the "Scala hacker guide" is another that may be beneficial to update.

Choices I can think of:

Straw man proposal

I have started seeing more and more users/repos treating GitHub issues as a forum for general questions of the variety "what am I doing wrong?" or "how do I?". It seems to work for a lot of projects; though, personally, other than when it points to a documentation error, possible bug, or major usability issue I still think a place other than GH is more appropriate.

adamvoss commented 7 years ago

The case for importing into scala/scala

Update: because scala/scala has been actively be actively used for pull requests, it is not possible to import the issues into scala/scala and preserve the original numbering used in SI. If there were still a desire to keep the code and issues in the same repository, it would likely mean abandoning the existing PR numbering, which would not be desirable because they are prevalent in commit messages.


I think JIRA SI issues should be imported into "scala/scala" rather than a separate repository. While there is a short-term advantage advantage to a separate repository because scala/scala will be clean, have a low open issue count, and seem more manageable. I would argue that on a project the size of scala, that state is not likely not be maintained long term as issues are created. For reference dotnet/roslyn (the new C#/VB.NET compilers) currently has 3303 open issues and Microsoft/TypeScript has 1,587.

Downside

Upside

adamvoss commented 7 years ago

Issue Migration

JIRA has 4 different projects, each numbered independently: INF, SI, SUGGEST, TEST. Are all to be migrated? If so, for numbering consistency they would each need their own GH repo. If numbering consistency is only needed for SI, then it could be imported first and the others after. JIRA reports 9,989 issues in SI; however the latest issue is SI-10088. I have not programatically confirmed yet, but that at least strongly suggests there are 99 "missing" SI issues. Unless GitHub can support importing the gaps (we will need to check with them I imagine), we may need to create "dummy" issues to fill the gaps and preserve ordering. What follow below only looks at the logistical aspect of moving the data and does not consider the technical implementation.

JIRA issue members

GitHub issue members

Straw man mapping (JIRA => GitHub)

It was proposed that potentially only some of a Ticket's information be imported and scabot could be used to import more upon request. This would allow you to delay implementing the mechanism to import the other details. Other than that, I am not sure what is sought by this. It does not seem that it would save any effort pertaining to shuttling data from JIRA to GitHub.

adamvoss commented 7 years ago

Labels, Milestone, Assignees

scabot

Not Considered

adriaanm commented 7 years ago

Thanks for the great overview, @vossad01! You've identified a lot of good issues/features. I'll find some more time soon (next week) to add feedback, but, in the mean time, here's some earlier work I did on exporting from JIRA to GitHub: https://github.com/scala/scabot/compare/master...adriaanm:jira

Regarding which projects to import from JIRA, I think it's safe to only do the main one (SI). I'm also strongly for a simple mapping from current Jira URL to future github one. At the same time, it would be nice to have bugs and PRs in the same scala/scala repo, so we can mention simply as #NNNNN (we'll need to add a leading digit). I could also be for a mention such as scala/SI#NNNN, which would be a separate project, but numbering is unmodified.

Another github limitation is that "Fix #NNN" comments only close issues when the PR is merged into the main branch. Since we have multiple branches, we'd still have to have scabot scan merged commits in other branches...

lrytz commented 7 years ago

@vossad01 thanks a lot for that overview!

adamvoss commented 7 years ago

I just realized I overlooked the fact that PRs are happening in scala/scala. Unless we are willing to abandon the numbers on those (which may not even be possible), it is not possible to import into scala/scala and preserve the same issue numbers as are in JIRA.

lrytz commented 7 years ago

I recently got to know @izuzak who is working at GitHub, and asked what is the best way for us to contact them:

The best way to reach out would be to send a message from https://github.com/contact. Please provide as many details as possible in that message about what you want to do and what you need from us. Mention my name at the top of the message and I'll try to reply to it and let you know what's possible from the things you need. Being as clear and as specific as possible about your specific needs and goals helps us provide advice.

So before doing that we have to find decisions to many of the questions you raised in this thread.

lrytz commented 7 years ago

Just received another bit: there's an API to import issues, described here: https://gist.github.com/jonmagic/5282384165e0f86ef105

We should definitely take a look and decide whether this is sufficient (it doesn't allow setting all the metadata, for example the author of issues and comments, https://gist.github.com/jonmagic/5282384165e0f86ef105#supported-issue-and-comment-fields).

szeiger commented 7 years ago

This is the code I wrote for migrating tickets from Assembla to github:https://github.com/szeiger/assembla-to-github. It uses this API. Our Assembla tracker was used mostly internally for project management and user-created tickets were already on github, so the API restrictions weren't as bad.

adamvoss commented 7 years ago

In my opinion there are too many users in JIRA to try to map them all for the sake of being able transfer subscribers. Perhaps someone would want to query the unique count of watching users to show otherwise.

The list of assignees from the Dashboard is relatively short though. I had decent success mapping by name to a GitHub user. I cannot guarantee I got everyone correct and there were 3 I was not comfortable assigning a mapping to based on search results, though perhaps someone else will know better.

Regarding the Import API: There would appear to be some risk with it, particularly because we have a large number of issues. If the repository is public it would be possible for someone to create an issue during the import process, upsetting the "same numbering effort." One option would be to accept if that happens, we would need to delete the repository and start over; otherwise, we could see if GH could do the import once we have everything prepared or if they could temporarily grant a private repository until the import is done.

adriaanm commented 7 years ago

A private repo is not a problem for us. We have a plan and a credit card ;-) On Wed, Dec 7, 2016 at 16:14 Adam Voss notifications@github.com wrote:

In my opinion there are too many users in JIRA to try to map them all for the sake of being able transfer subscribers. Perhaps someone would want to query the unique count of watching users to show otherwise.

The list of assignees from the Dashboard is relatively short though. I had decent success mapping by name to a GitHub user https://gist.github.com/vossad01/7d35971809b97c714aadd2be2b364f34#file-si-jira-github-user-mapping-csv. I cannot guarantee I got everyone correct and there were 3 I was not comfortable assigning a mapping to based on search results, though perhaps someone else will know better.

Regarding the Import API: There would appear to be some risk with it, particularly because we have a large number of issues. If the repository is public it would be possible for someone to create an issue during the import process, upsetting the "same numbering effort." One option would be to accept if that happens, we would need to delete the repository and start over; otherwise, we could see if GH could do the import once we have everything prepared or if they could temporarily grant a private repository until the import is done.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scala/scala-dev/issues/267#issuecomment-265473254, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFjy0Vn71-D1m08vsHFW5JL7J1mzTT_ks5rFs1wgaJpZM4KzJdZ .

adriaanm commented 7 years ago

I've begun the work on migrating JIRA to a new github repo (I propose scala/bug): https://github.com/adriaanm/bbj

I'd like to reduce the number of labels we're using. Here's a curated list based on what's in jira. The first label on the line is intended to be the canonical one, with other labels on that line rewritten to it. The numbers are frequencies.

Category/priority

(improvement,1386) (quickfix,1010) (community,337) (low-hanging-fruit,10) (feature,106) (blocker,259) (critical,647) (regression,204)

Workflow

(minimized,231) (has-pull-request,534) (backport,42)

Component

(reflection,375) (typer,319) (infer,144) (type-inference,45) (macro,17) (macros,267) (backend,269) (optimizer,10) (opt,79) (inliner,11) (specialization,108) (specialized,6) (patmat,251) (pattern-matching,122) (exhaustiveness,19) (unreachability,9) (mixin,10) (parser,17) (erasure,16)

(repl,338) (interactive,68) (scaladoc,329)

(library,65) (collections,36)

(docs,199) (documentation,28) (spec,173) (build,130)

Kind of failure

(compiler-crash,188) (crash,18) (runtime-crash,19) (wrong-bytecode,44) (bytecode,14) verifyerror (should-not-compile,14) (soundness,29) (unsound,7) (does-not-compile,78) (separate-compilation,18)

Topics

(usability,21) (lint,12) (error-messages,124) (structural-types,18) (applyDynamic,17) (java-interop,109) (implicit,85) (valueclass,76) (performance,73) (named-default-args,46) (existential,42) (annotations,37) (enum,37) (quasiquotes,28) (access,26) (depmet,24) (delayedinit,23) (tcpoly,22) (deprecation,21) (case-class,20) (dependent-types,19) (implicits,19) (serialization,15) (implicit-classes,15) (string-interpolation,14) (lub,14) (positions,14) (package-objects,14) (varargs,14) (overloading,13)

SethTisue commented 7 years ago

(quickfix,1010) (community,337) (low-hanging-fruit,10)

I would not include "community" in this group. many of the tickets marked "community" are definitely not quick fixes at all. I don't know how "community" was used in olden times, but for the past 2 years or so, I've been using it to mark tickets that seem tractable (a low bar) but that the core team is unlikely to take on anytime soon. (if the ticket was also easy, I also added "low-hanging-fruit")

lrytz commented 7 years ago

in my epfl time, "community" was basically the current "backlog", issues we don't schedule / plan to work on in the core team.

adriaanm commented 7 years ago

I'm in the final phase of the import from issues.scala-lang.org to the issue tracker for scala/bug. I created a contributors team that consists of all JIRA assignees for which I could find a GitHub username, so that we can preserve that metadata. I'll give people a day or two to accept the invitation. If not, the assignee will be persisted as a comment instead of the assignee field.

Ichoran commented 7 years ago

@adriaanm - Where is the list of who is mapped with whom? Lukas and I were confused before on Adam's list, as were Jason and Martin. (Edmund Noble was actually the one who noticed that.) While I sort of doubt that you'd have left Martin and Jason confused, it does raise the point that the mapping should maybe be up somewhere so it can be proofread.

adriaanm commented 7 years ago

https://github.com/adriaanm/bbj/blob/master/src/main/scala/bbj/Issues.scala#L11

adriaanm commented 7 years ago

Eyes very welcome in all parts of that code base! I can also produce a more recent version of https://github.com/adriaanm/jira-markupdown (the changes made by my textile -> markdown converter) if you'd like to sample it. If you'd like to experiment locally, I have a 30M zip with the full JIRA export as JSON that I can share too.

adriaanm commented 7 years ago

Also, if you're in the contributors team, you can see a sneak preview import over at https://github.com/scala/bug/issues. Let me know if you spot any errors/omissions in the translation from textile to markdown or JIRA metadata to GitHub's!

adriaanm commented 7 years ago

Pinging the top-10 missing bug fixers to accept the scala org invitation at https://github.com/scala, so we can get closer to 100% preservation of assignees. Please accept by tomorrow, or your assignment will be recorded as a comment, not an actual GitHub assignment.

ashawley commented 7 years ago

I can also produce a more recent version of https://github.com/adriaanm/jira-markupdown (the changes made by my textile -> markdown converter)

If these files had a .md extension they could be previewed on the GitHub web site. That might get you more eyeballs.

DRMacIver commented 7 years ago

Given that I have not been actively involved in Scala for nearly 10 years, I think it's probably best if anything assigned to me doesn't transfer over as assigned to me. :-)

SethTisue commented 7 years ago

@DRMacIver thanks for stopping by for old time's sake, though 👋

adriaanm commented 7 years ago

If these files had a .md extension they could be previewed on the GitHub web site.

done: https://github.com/adriaanm/jira-markupdown/commit/8a61ae7818c049d92dc19aeb2cce47510d78c86b#diff-b3a28b90a1a4bc987373d0bf265ec761

now as one file per issue instead of individual snippets

som-snytt commented 7 years ago

I'd like to express my appreciation for the previous generation of contributors.

Since we're close to the Easter season, I have to note that @mcdirmid 's autogravatar is a cross. Like a real Golgotha cross.

Just like there's a geological signal for the event that wiped out the dinosaurs, probably we should distinguish contributors before the paulpexit and after.

adriaanm commented 7 years ago

Indeed! To be clear, I'm trying to preserve the geological record, not to spam / create more work.

👋 & 🙏 to those who preceded us :-)

SethTisue commented 7 years ago

https://github.com/scala/bug is ALIVE!!!

thanks @vossad01 and others for helping us think through all the steps and considerations here

updating contributor docs is now https://github.com/scala/scala-lang/issues/631

possible Scabot improvements is now https://github.com/scala/scabot/issues/73