adriaanm commented 7 years ago

Why/what?

simplify bug reporting & discovery
reduce duplication of tickets (and work),
completely standardize workflow on github,
engage community in maintaining our issues treasure trove,
have canonical repo of bug reports,
easy way to correlate test to reported issue (currently tNNNN -> SI-NNNN on issues.scala-lang.org, with no fixed format for scala/scala-dev#NNN)
existing numbering schemes should carry forward (tests are named after jira ticket numbers)

Who?

Help wanted! If you'd like to hack on this, scala/scabot already has a JIRA api, and we can look into providing a JIRA dump to facilitate hacking on an import.

When?

Ideally early next 2017, but depends on who's available to work on this.

Potential pitfalls

permissions: reporter should be able to assign labels/milestones or somehow convey metadata
spam?

How?

new bugs should be reported on github, in scala/scala (to make it easy to close bugs)
issues.scala-lang.org will remain (at least) as an archive, preferably entirely readonly or at least prevent creation of new tickets
likely, import JIRA into separate github repo (perhaps just title/description + link to JIRA for more info) with same numbering scheme as JIRA. Could have scabot command to import more details from JIRA
consider a front matter for issue reporting/import (affected milestones, milestone with fix, labels, priority, original author, component) --> scabot parses this (YAML?) and assigns labels?

Non-goals

create more new repos than strictly needed
lets not bikeshed names for now, and refer to the (potential) repo for the (hypothetical) jira import scala/SI (so that SI-NNNN easily rewrites to scala/SI#NNNN)

dwijnand commented 7 years ago

AFAIK Akka engaged with GitHub when they did their big bug import.

If you're interested in that (and not already aware) you might want to ask @ktoso about what they can do with their internal APIs.

ktoso commented 7 years ago

Hi there, indeed we talked with github support when we moved. What they can do is "bulk import" if you can give them the data as a csv or json I think - better ask before you prepare the data about the format. Tags you'll need to apply yourself, I think they can mark all imported ones with a tag if you'd ask them to.

Be aware that they can't "move issues between github projects", this we did semi-automatically (me calling a script many times over 2 days) recently as well, in case you'd need that let me know so I can show you status quo (though there's also a hidden API, but we did not end up using it - that API you could use yourself as well to do the bulk import yourself, would require coding against that API (RESTful-ish)).

Note that the bulk imports are much better for the community, because they won't trigger emails to people who are watching the repository.

adamvoss commented 7 years ago

I am trying to help out with this. There are a few places I have included a "straw man proposal." I do not feel it is really my place to say how these things should be done; however, I felt comming back with a lot of information and open-ended questions was perhaps not the most helpful I could be. Please take them as only as a "straw man proposal." While my biases, opinions, and guesses are present I include them only to try to advance the discussion and hopefully save work and not as how I trying to push it to be done.

Steps for implementation (high-level, unordered)

[ ] Implement any new scabot functionality
[ ] Enable Issue Tracker on scala/scala
[ ] Migrate JIRA Issues to GitHub repository
[ ] Make JIRA read-only (instructions)
[ ] Update the Bug Reporting Guide on scala-lang. Can be done via PR to scala/scala-lang.
[ ] Update CONTRIBUTING.md

Discussions/Decisions

[ ] What guidance do we give new reporters or other actions to take to reduce unwanted issues?
[ ] Can we import into the existing scala/scala? See "The case for importing into scala/scala" below.
- Likely not [1], because numbering cannot be preserved as I was originally thinking it would be when I posed this question. I am still concerned about the fact new issues could be created in the archive repository.
- Yes, if the numbers are mapped easily and predictably [1]
[ ] Should INF, SUGGEST, or TEST issues be imported in addition to SI issues?
- No [1][2]
[ ] How much of an issue's content should be imported from JIRA?
[ ] Should everything be imported on an issue, or only a subset with more by request (scabot command)?
- At once [1]
[ ] Development of a desired tagging (label) scheme. Do you want something established at launch, or you will figure it out and adapt later?
[ ] Should reporters be able to specify the attributes of a GH Issue? This seems like a feature with some potential for abuse:
1. Labels? If yes, arbitrary or restricted?
2. Milestones?
3. Assignees?
[ ] Does this repository (scala/scala-dev) live on? Or is it superceeded?

I have split my write-up on the above points into multiple comments to make it easier to link specific topics/issues if needed. I can edit this comment as things are decided or new things are identified.

adamvoss commented 7 years ago

Combatting spam

Spam was raised as a potential concern. There is no way to eliminate this risk, but some actions can be taken to try to mitigate this and encourage users to utilize Issues as intended, see tools below.

Tools

CONTRIBUTING.md
- Displays a banner and link at top of page on Issues and Pull Request
ISSUE_TEMPLATE.md
- Text that is pre-populated in the issue description box. This could can be utilized to try to enforce a structure to created issues. It also could be used to place another warning/guidance on your expectation of the issue (which reporters would then need to delete).
- Aside: PULL_REQUEST_TEMPLATE.md could do the same for Pull Requests
Bug Reporting Guide on scala-lang.
- The relation between this and CONTRIBUTING.md probably deserves some thought.

Community Places

An aspect of limiting "spam" or other unwanted GitHub issues that I think important is to give direction on where to go for various purposes. The scala-lang community page could probably be updated to reflect this as well (IMO, the page needs an overhaul as is). The depending on the decisions made, the "Scala hacker guide" is another that may be beneficial to update.

Choices I can think of:

GitHub Issues
StackOverflow
Gitter
Reddit
Mailing Lists
Discourse

Straw man proposal

GitHub Issues <= Bugs, Suggestions, Change/Feature Requests, Suspected Bugs, Issues pertaining to trying to hack on scala itself
StackOverflow <= "How to do X?","What is W?", "Why Z?"
Gitter <= Informal discussion and help. Gitter is fast but (in my experience) is not good for archival and discovery purposes so other platforms are better if someone else might benefit from the discussion.
Reddit <= Exists for Redditors and things that don't fall elsewhere, people there will trickle up things like Bugs where needed.
Mailing Lists <= For people who like mailing lists and things that don't fall in other categories.
Discourse <= I understand the merits of Discourse as a platform; however, I see it as conflicting with the other platforms so don't really understand this one.

I have started seeing more and more users/repos treating GitHub issues as a forum for general questions of the variety "what am I doing wrong?" or "how do I?". It seems to work for a lot of projects; though, personally, other than when it points to a documentation error, possible bug, or major usability issue I still think a place other than GH is more appropriate.

adamvoss commented 7 years ago

The case for importing into scala/scala

Update: because scala/scala has been actively be actively used for pull requests, it is not possible to import the issues into scala/scala and preserve the original numbering used in SI. If there were still a desire to keep the code and issues in the same repository, it would likely mean abandoning the existing PR numbering, which would not be desirable because they are prevalent in commit messages.

I think JIRA SI issues should be imported into "scala/scala" rather than a separate repository. While there is a short-term advantage advantage to a separate repository because scala/scala will be clean, have a low open issue count, and seem more manageable. I would argue that on a project the size of scala, that state is not likely not be maintained long term as issues are created. For reference dotnet/roslyn (the new C#/VB.NET compilers) currently has 3303 open issues and Microsoft/TypeScript has 1,587.

Downside

Immediately have 1,920 open issues.
Finding issues may be more difficult due to the high number of issues (8,069 closed issues)
5-digits issue numbers on GitHub ~~rather than starting again with 1~~. scala/scala is already in mid-4 digits due to PRs, it won't be long until it is at 5 digits as well.
All issue numbers cannot be identically preserved because there exist conflict between the existing PR numbers and the SI numbers.

Upside

Better achieve stated goals, particularly "have canonical repo of bug reports"
Users do not need to search two repositories to check for duplicate bugs/issues
Users will not be able to erroneously create new issues in the JIRA-issues repository. Is a separate repository is created, nothing would prevent new issues from being created in the repository of imported issues.
~~Current numbering is preserved. No issue of SI-10088 and GH-13, for example~~. Due to PRs this is not possible.

adamvoss commented 7 years ago

Issue Migration

JIRA has 4 different projects, each numbered independently: INF, SI, SUGGEST, TEST. Are all to be migrated? If so, for numbering consistency they would each need their own GH repo. If numbering consistency is only needed for SI, then it could be imported first and the others after. JIRA reports 9,989 issues in SI; however the latest issue is SI-10088. I have not programatically confirmed yet, but that at least strongly suggests there are 99 "missing" SI issues. Unless GitHub can support importing the gaps (we will need to check with them I imagine), we may need to create "dummy" issues to fill the gaps and preserve ordering. What follow below only looks at the logistical aspect of moving the data and does not consider the technical implementation.

JIRA issue members

Title
Description
Reporter
Type
Priority
Affects Versions
Components
Labels
Status
Resolution
Fix Versions
Assignee
Votes
Watchers
Comments
History
Created Time
Resolved Time
Updated Time
Attachments

GitHub issue members

Title
Description
Reporter
Status
Comments
Projects
Labels
Milestone
Assignees
Subscribers
Reactions
Created Time
Closed Time
Updated Time
Attachments

Straw man mapping (JIRA => GitHub)

Title => Title
Description => Description
Reporter => Description
Type => Labels
Priority => Labels
Affects Versions => Description
Components => Labels
Labels => Labels (for approved labels)
Status => Status
Resolution => Labels (drop Unresolved)
Fix Versions => Milestone
Assignee => ()
Votes => Description
Watchers => ()
Comments => Comments
History => ()
Created Time => Created Time
Resolved Time => Closed Time
Updated Time => Updated Time
Attachments => Attachments

It was proposed that potentially only some of a Ticket's information be imported and scabot could be used to import more upon request. This would allow you to delay implementing the mechanism to import the other details. Other than that, I am not sure what is sought by this. It does not seem that it would save any effort pertaining to shuttling data from JIRA to GitHub.

adamvoss commented 7 years ago

Labels, Milestone, Assignees

A GitHub user who is not affiliated (Member, Collaborator, etc.) with a repository does not have the ability to set labels, milestones, or assignees on an issue. If this is desired functionality, it will need to be implemented though scabot.

scabot

Scabot may need changes to support assigning Labels, Milestones, and Assignees.
Scabot may need changes to help support workflow. Such as closing an issue after a commit is merged into a particular branch.
Scabot may need changes to help police the repository of imported issues (given it is decided that is not also the repository for new issues) since there does not appear to be a way to prevent users from creating new issues there.
If only a partial import is done from JIRA with more being available on-demand scabot would need changes to handle this.

Not Considered

GitHub Projects. I think it is removed enough from core issue tracking functionality enough to safely be considered out of scope for the migration.

adriaanm commented 7 years ago

Thanks for the great overview, @vossad01! You've identified a lot of good issues/features. I'll find some more time soon (next week) to add feedback, but, in the mean time, here's some earlier work I did on exporting from JIRA to GitHub: https://github.com/scala/scabot/compare/master...adriaanm:jira

Regarding which projects to import from JIRA, I think it's safe to only do the main one (SI). I'm also strongly for a simple mapping from current Jira URL to future github one. At the same time, it would be nice to have bugs and PRs in the same scala/scala repo, so we can mention simply as #NNNNN (we'll need to add a leading digit). I could also be for a mention such as scala/SI#NNNN, which would be a separate project, but numbering is unmodified.

Another github limitation is that "Fix #NNN" comments only close issues when the PR is merged into the main branch. Since we have multiple branches, we'd still have to have scabot scan merged commits in other branches...

lrytz commented 7 years ago

@vossad01 thanks a lot for that overview!

We should definitely reach out to github support and ask what options we have. @szeiger talked to them a while ago (for a different repo), they had an import file format at the time. I also heard about some internal tools, but that might not be in use anymore.
I also think it would be great to have them in scala/scala. We could add 10'000 to every issue number, so the ones in JIRA that are currently in the 10k would start at 20k.
After importing, we should probably move the "counter" on github (should be doable through their support), so that new tickets / PRs don't fall into the remaining slots below 10k.
Agree we should skip INF/SUGGEST/TEST.
Instead of having JIRA read-only, we could make the URLs redirect to the corresponding issue on github - all relevant data should be there.
about the data
- I think we should not discard "Assignee" and "Watchers", but map them to the corresponding gh users.
- "votes" can maybe be converted to 👍s on the main issue description.
- we could also try to keep issue creation date
- what to do with file attachments? upload somewhere?
"99 missing SI issues": yes, some were deleted (i think mostly spam, maybe for other reasons)
"import more upon request": i think we should import everything at once, so we don't need JIRA anymore, and also to enable searching existing issues on github.

adamvoss commented 7 years ago

I just realized I overlooked the fact that PRs are happening in scala/scala. Unless we are willing to abandon the numbers on those (which may not even be possible), it is not possible to import into scala/scala and preserve the same issue numbers as are in JIRA.

lrytz commented 7 years ago

I recently got to know @izuzak who is working at GitHub, and asked what is the best way for us to contact them:

The best way to reach out would be to send a message from https://github.com/contact. Please provide as many details as possible in that message about what you want to do and what you need from us. Mention my name at the top of the message and I'll try to reply to it and let you know what's possible from the things you need. Being as clear and as specific as possible about your specific needs and goals helps us provide advice.

So before doing that we have to find decisions to many of the questions you raised in this thread.

lrytz commented 7 years ago

Just received another bit: there's an API to import issues, described here: https://gist.github.com/jonmagic/5282384165e0f86ef105

We should definitely take a look and decide whether this is sufficient (it doesn't allow setting all the metadata, for example the author of issues and comments, https://gist.github.com/jonmagic/5282384165e0f86ef105#supported-issue-and-comment-fields).

szeiger commented 7 years ago

This is the code I wrote for migrating tickets from Assembla to github:https://github.com/szeiger/assembla-to-github. It uses this API. Our Assembla tracker was used mostly internally for project management and user-created tickets were already on github, so the API restrictions weren't as bad.

adamvoss commented 7 years ago

In my opinion there are too many users in JIRA to try to map them all for the sake of being able transfer subscribers. Perhaps someone would want to query the unique count of watching users to show otherwise.

The list of assignees from the Dashboard is relatively short though. I had decent success mapping by name to a GitHub user. I cannot guarantee I got everyone correct and there were 3 I was not comfortable assigning a mapping to based on search results, though perhaps someone else will know better.

Regarding the Import API: There would appear to be some risk with it, particularly because we have a large number of issues. If the repository is public it would be possible for someone to create an issue during the import process, upsetting the "same numbering effort." One option would be to accept if that happens, we would need to delete the repository and start over; otherwise, we could see if GH could do the import once we have everything prepared or if they could temporarily grant a private repository until the import is done.

adriaanm commented 7 years ago

A private repo is not a problem for us. We have a plan and a credit card ;-) On Wed, Dec 7, 2016 at 16:14 Adam Voss notifications@github.com wrote:

In my opinion there are too many users in JIRA to try to map them all for the sake of being able transfer subscribers. Perhaps someone would want to query the unique count of watching users to show otherwise.

The list of assignees from the Dashboard is relatively short though. I had decent success mapping by name to a GitHub user https://gist.github.com/vossad01/7d35971809b97c714aadd2be2b364f34#file-si-jira-github-user-mapping-csv. I cannot guarantee I got everyone correct and there were 3 I was not comfortable assigning a mapping to based on search results, though perhaps someone else will know better.

Regarding the Import API: There would appear to be some risk with it, particularly because we have a large number of issues. If the repository is public it would be possible for someone to create an issue during the import process, upsetting the "same numbering effort." One option would be to accept if that happens, we would need to delete the repository and start over; otherwise, we could see if GH could do the import once we have everything prepared or if they could temporarily grant a private repository until the import is done.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scala/scala-dev/issues/267#issuecomment-265473254, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFjy0Vn71-D1m08vsHFW5JL7J1mzTT_ks5rFs1wgaJpZM4KzJdZ .

adriaanm commented 7 years ago

I've begun the work on migrating JIRA to a new github repo (I propose scala/bug): https://github.com/adriaanm/bbj

I'd like to reduce the number of labels we're using. Here's a curated list based on what's in jira. The first label on the line is intended to be the canonical one, with other labels on that line rewritten to it. The numbers are frequencies.

Category/priority

(improvement,1386) (quickfix,1010) (community,337) (low-hanging-fruit,10) (feature,106) (blocker,259) (critical,647) (regression,204)

Workflow

(minimized,231) (has-pull-request,534) (backport,42)

Component

(reflection,375) (typer,319) (infer,144) (type-inference,45) (macro,17) (macros,267) (backend,269) (optimizer,10) (opt,79) (inliner,11) (specialization,108) (specialized,6) (patmat,251) (pattern-matching,122) (exhaustiveness,19) (unreachability,9) (mixin,10) (parser,17) (erasure,16)

(repl,338) (interactive,68) (scaladoc,329)

(library,65) (collections,36)

(docs,199) (documentation,28) (spec,173) (build,130)

Kind of failure

(compiler-crash,188) (crash,18) (runtime-crash,19) (wrong-bytecode,44) (bytecode,14) verifyerror (should-not-compile,14) (soundness,29) (unsound,7) (does-not-compile,78) (separate-compilation,18)

Topics

(usability,21) (lint,12) (error-messages,124) (structural-types,18) (applyDynamic,17) (java-interop,109) (implicit,85) (valueclass,76) (performance,73) (named-default-args,46) (existential,42) (annotations,37) (enum,37) (quasiquotes,28) (access,26) (depmet,24) (delayedinit,23) (tcpoly,22) (deprecation,21) (case-class,20) (dependent-types,19) (implicits,19) (serialization,15) (implicit-classes,15) (string-interpolation,14) (lub,14) (positions,14) (package-objects,14) (varargs,14) (overloading,13)

SethTisue commented 7 years ago

(quickfix,1010) (community,337) (low-hanging-fruit,10)

I would not include "community" in this group. many of the tickets marked "community" are definitely not quick fixes at all. I don't know how "community" was used in olden times, but for the past 2 years or so, I've been using it to mark tickets that seem tractable (a low bar) but that the core team is unlikely to take on anytime soon. (if the ticket was also easy, I also added "low-hanging-fruit")

lrytz commented 7 years ago

in my epfl time, "community" was basically the current "backlog", issues we don't schedule / plan to work on in the core team.

adriaanm commented 7 years ago

I'm in the final phase of the import from issues.scala-lang.org to the issue tracker for scala/bug. I created a contributors team that consists of all JIRA assignees for which I could find a GitHub username, so that we can preserve that metadata. I'll give people a day or two to accept the invitation. If not, the assignee will be persisted as a comment instead of the assignee field.

Ichoran commented 7 years ago

@adriaanm - Where is the list of who is mapped with whom? Lukas and I were confused before on Adam's list, as were Jason and Martin. (Edmund Noble was actually the one who noticed that.) While I sort of doubt that you'd have left Martin and Jason confused, it does raise the point that the mapping should maybe be up somewhere so it can be proofread.

adriaanm commented 7 years ago

https://github.com/adriaanm/bbj/blob/master/src/main/scala/bbj/Issues.scala#L11

adriaanm commented 7 years ago

Eyes very welcome in all parts of that code base! I can also produce a more recent version of https://github.com/adriaanm/jira-markupdown (the changes made by my textile -> markdown converter) if you'd like to sample it. If you'd like to experiment locally, I have a 30M zip with the full JIRA export as JSON that I can share too.

adriaanm commented 7 years ago

Also, if you're in the contributors team, you can see a sneak preview import over at https://github.com/scala/bug/issues. Let me know if you spot any errors/omissions in the translation from textile to markdown or JIRA metadata to GitHub's!

adriaanm commented 7 years ago

Pinging the top-10 missing bug fixers to accept the scala org invitation at https://github.com/scala, so we can get closer to 100% preservation of assignees. Please accept by tomorrow, or your assignment will be recorded as a comment, not an actual GitHub assignment.

@VladimirNik 3
@dchenbecker 6
@rklaehn 8
@stepancheg 16
@DRMacIver 28
@burakemir 31
@magarciaEPFL 38
@lindydonna 45
@JamesIry 51
@mcdirmid 124

ashawley commented 7 years ago

I can also produce a more recent version of https://github.com/adriaanm/jira-markupdown (the changes made by my textile -> markdown converter)

If these files had a .md extension they could be previewed on the GitHub web site. That might get you more eyeballs.

DRMacIver commented 7 years ago

Given that I have not been actively involved in Scala for nearly 10 years, I think it's probably best if anything assigned to me doesn't transfer over as assigned to me. :-)

SethTisue commented 7 years ago

@DRMacIver thanks for stopping by for old time's sake, though 👋

adriaanm commented 7 years ago

If these files had a .md extension they could be previewed on the GitHub web site.

done: https://github.com/adriaanm/jira-markupdown/commit/8a61ae7818c049d92dc19aeb2cce47510d78c86b#diff-b3a28b90a1a4bc987373d0bf265ec761

now as one file per issue instead of individual snippets

som-snytt commented 7 years ago

I'd like to express my appreciation for the previous generation of contributors.

Since we're close to the Easter season, I have to note that @mcdirmid 's autogravatar is a cross. Like a real Golgotha cross.

Just like there's a geological signal for the event that wiped out the dinosaurs, probably we should distinguish contributors before the paulpexit and after.

adriaanm commented 7 years ago

Indeed! To be clear, I'm trying to preserve the geological record, not to spam / create more work.

👋 & 🙏 to those who preceded us :-)

SethTisue commented 7 years ago

https://github.com/scala/bug is ALIVE!!!

thanks @vossad01 and others for helping us think through all the steps and considerations here

updating contributor docs is now https://github.com/scala/scala-lang/issues/631

possible Scabot improvements is now https://github.com/scala/scabot/issues/73

scala / scala-dev

Plan move from JIRA to GitHub issues. #267

Why/what?

Who?

When?

Potential pitfalls

How?

Non-goals

Steps for implementation (high-level, unordered)

Discussions/Decisions

Combatting spam

Tools

Community Places

Straw man proposal

The case for importing into scala/scala

Downside

Upside

Issue Migration

JIRA issue members

GitHub issue members

Straw man mapping (JIRA => GitHub)

Labels, Milestone, Assignees

scabot

Not Considered

Category/priority

Workflow

Component

Kind of failure

Topics