psf / gh-migration

This repo is used to manage the migration from bugs.python.org to GitHub.
42 stars 8 forks source link

Map bpo issue metadata to GitHub fields/labels #5

Closed ezio-melotti closed 2 years ago

ezio-melotti commented 3 years ago

This issue is about issue metadata (priority, versions, status, etc.), how/where to import them in GitHub, and what metadata to keep/add/remove/update. User/comment/file metadata will be discussed in a separate issue.

bpo tracks different metadata for each issue (see e.g. https://bugs.python.org/issue2771 ) including: title, comments, files (attachments), creator, creation, actor, activity, type, stage, components, versions, status, resolution, dependencies, superseder, assigned to, nosy list, priority, keywords, remote HG repos, linked PRs

The meaning of each field is explained in the devguide. The fields are defined in the schema.py of the bpo instance. The creator, creation (datetime), (last) actor, (last) activity (datetime) are common to all classes.

The other fields will need to be replaced with something else (mostly labels) or removed.

Labels in GitHub can be grouped either with colors, and/or with a prefix like priority-high, priority-medium, priority-low. GitHub is working on adding custom fields, but they will be available in ~6 months.

Actions can be used to automate certain tasks in addition or instead of bots (e.g. adding labels, closing stale issues, etc.).

Unused metadata that are not converted to labels (or anything else) can be stored in a comment so that can be retrieved if needed (e.g. if we move away from GH).


On the python/cpython there are currently 32 labels:


This is the full list of all the fields we have in Roundup, and how we could convert them to GitHub Issues:

so:

The stage could use the existing stage labels. An awaiting triaging might be added.

merwok commented 3 years ago

An additional consideration: metadata on bpo supports searching with powerful filtering and ordering, also with the ability to save searches. Github’s search is poorer here, browsing and searching issues after the migration will be less pleasant 😦 Maybe all types, components and keywords will need to be converted to labels, which will make a huge flat list. Maybe a custom page (javascript app) will be needed to offer a better search experience.

ezio-melotti commented 2 years ago

Here are some stats on bpo fields usage, that might help decide which ones to keep. The total for each table might add up to more than 100% if issues have more than one label (e.g. multiple components or versions).

Issues (open/all): 7262/57119

type

bpo field open all
behavior 2807 (38.7%) 17747 (31.1%)
enhancement 2472 (34.0%) 11468 (20.1%)
crash 184 ( 2.5%) 2210 ( 3.9%)
compile error 161 ( 2.2%) 1381 ( 2.4%)
performance 156 ( 2.1%) 1182 ( 2.1%)
resource usage 78 ( 1.1%) 890 ( 1.6%)
security 65 ( 0.9%) 464 ( 0.8%)
Total 5923 (81.6%) 35342 (61.9%)

stage

bpo field open all
patch review 2099 (28.9%) 2884 ( 5.0%)
needs patch 886 (12.2%) 1623 ( 2.8%)
test needed 297 ( 4.1%) 874 ( 1.5%)
resolved 73 ( 1.0%) 27057 (47.4%)
commit review 21 ( 0.3%) 288 ( 0.5%)
backport needed 1 ( 0.0%) 2 ( 0.0%)
Total 3377 (46.5%) 32728 (57.3%)

components

bpo field open all
Library (Lib) 2738 (37.7%) 16043 (28.1%)
Documentation 1054 (14.5%) 8726 (15.3%)
Interpreter Core 630 ( 8.7%) 7853 (13.7%)
Windows 479 ( 6.6%) 3162 ( 5.5%)
Extension Modules 360 ( 5.0%) 3176 ( 5.6%)
Tests 350 ( 4.8%) 3483 ( 6.1%)
asyncio 279 ( 3.8%) 970 ( 1.7%)
IDLE 274 ( 3.8%) 1479 ( 2.6%)
Build 271 ( 3.7%) 2641 ( 4.6%)
email 160 ( 2.2%) 447 ( 0.8%)
IO 140 ( 1.9%) 644 ( 1.1%)
macOS 119 ( 1.6%) 1253 ( 2.2%)
ctypes 117 ( 1.6%) 477 ( 0.8%)
C API 105 ( 1.4%) 274 ( 0.5%)
Unicode 102 ( 1.4%) 950 ( 1.7%)
Installation 96 ( 1.3%) 789 ( 1.4%)
Tkinter 94 ( 1.3%) 821 ( 1.4%)
SSL 63 ( 0.9%) 316 ( 0.6%)
XML 58 ( 0.8%) 457 ( 0.8%)
2to3 (2.x to 3.x conversion tool) 57 ( 0.8%) 342 ( 0.6%)
Cross-Build 54 ( 0.7%) 161 ( 0.3%)
Demos and Tools 44 ( 0.6%) 512 ( 0.9%)
Subinterpreters 38 ( 0.5%) 72 ( 0.1%)
Regular Expressions 37 ( 0.5%) 519 ( 0.9%)
Argument Clinic 36 ( 0.5%) 123 ( 0.2%)
FreeBSD 9 ( 0.1%) 33 ( 0.1%)
Parser 9 ( 0.1%) 35 ( 0.1%)
Distutils 5 ( 0.1%) 1141 ( 2.0%)
Total 7778 (107.1%) 56899 (99.6%)

versions

bpo field open all
Python 3.8 2046 (28.2%) 6851 (12.0%)
Python 3.9 1845 (25.4%) 5067 ( 8.9%)
Python 3.7 1706 (23.5%) 7442 (13.0%)
Python 3.10 1452 (20.0%) 3508 ( 6.1%)
Python 3.6 1390 (19.1%) 7054 (12.3%)
Python 3.11 541 ( 7.4%) 1203 ( 2.1%)
Total 8980 (123.7%) 31125 (54.5%)

resolution

bpo field open all
fixed 21 ( 0.3%) 24291 (42.5%)
not a bug 11 ( 0.2%) 6178 (10.8%)
duplicate 7 ( 0.1%) 3720 ( 6.5%)
wont fix 7 ( 0.1%) 2295 ( 4.0%)
third party 7 ( 0.1%) 701 ( 1.2%)
remind 5 ( 0.1%) 18 ( 0.0%)
out of date 4 ( 0.1%) 3145 ( 5.5%)
postponed 4 ( 0.1%) 114 ( 0.2%)
works for me 4 ( 0.1%) 952 ( 1.7%)
later 3 ( 0.0%) 154 ( 0.3%)
rejected 3 ( 0.0%) 2801 ( 4.9%)
Total 76 ( 1.0%) 44369 (77.7%)

priority

bpo field open all
normal 6951 (95.7%) 51387 (90.0%)
low 229 ( 3.2%) 2483 ( 4.3%)
high 55 ( 0.8%) 1583 ( 2.8%)
critical 10 ( 0.1%) 449 ( 0.8%)
release blocker 2 ( 0.0%) 933 ( 1.6%)
deferred blocker 1 ( 0.0%) 107 ( 0.2%)
Total 7248 (99.8%) 56942 (99.7%)

keywords

bpo field open all
patch 2878 (39.6%) 25886 (45.3%)
easy 202 ( 2.8%) 2139 ( 3.7%)
needs review 84 ( 1.2%) 928 ( 1.6%)
newcomer friendly 17 ( 0.2%) 98 ( 0.2%)
easy (C) 11 ( 0.2%) 75 ( 0.1%)
3.5regression 10 ( 0.1%) 60 ( 0.1%)
pep3121 8 ( 0.1%) 57 ( 0.1%)
buildbot 7 ( 0.1%) 328 ( 0.6%)
3.6regression 7 ( 0.1%) 47 ( 0.1%)
3.8regression 6 ( 0.1%) 55 ( 0.1%)
3.3regression 3 ( 0.0%) 79 ( 0.1%)
3.7regression 3 ( 0.0%) 60 ( 0.1%)
3.9regression 3 ( 0.0%) 36 ( 0.1%)
3.10regression 3 ( 0.0%) 26 ( 0.0%)
gsoc 2 ( 0.0%) 19 ( 0.0%)
3.2regression 2 ( 0.0%) 31 ( 0.1%)
security_issue 2 ( 0.0%) 35 ( 0.1%)
3.4regression 2 ( 0.0%) 43 ( 0.1%)
Total 3250 (44.8%) 30002 (52.5%)
gvanrossum commented 2 years ago

Can we drop ‘resolution’? GitHub doesn’t have this and I’ve never missed it. I don’t think I know of any project that introduced a set of labels with this purpose. We just explain the reason for closing in the message when we close it. I don’t recall ever searching for issues with a specific resolution.

I’ve also often wondered why we have ‘stage’.

ezio-melotti commented 2 years ago

Both stage and resolution mostly have an informative purpose. The stage tells what's the next thing needed to make the issue move forward (are we waiting for a fix? for a review?), whereas the resolution tells the reason why the issue was closed (e.g. was it fixed? rejected?).

I agree that the resolution can be dropped.

For the stage the situation is a bit more complicated, because on bpo we only had issues, whereas here we also have PRs. In addition, we already have a set of labels for PR stages that are added by @bedevere-bot automatically.

The current sequence on bpo is roughly no selection -> test needed -> needs patch -> patch review -> commit review -> backport needed -> resolved

If an issue has no selection, it's usually because it's not triaged yet or because people are still figuring out whether it needs a fix or not. If instead a test to reproduce the issue is needed, we are in the test needed stage. I'm considering adding an untriaged/new label automatically on new issues, that can be removed as soon as someone comes around to triage them and the issue is being discussed (this also leaves the issue less label-cluttered).

Then, if the issue has no PR linked to it, we can assume we are either still discussing, or in the patch needed stage. If the PR is not a WIP or if a review has been requested, then we are in the patch review or commit-review stage. The backport needed is handled by a bot and there is already a set of labels for each version. Once a PR/issue is merged/closed, the PR/issue is implicitly resolved. All this is already visible through the GitHub UI, without the need for labels.

To summarize:

ezio-melotti commented 2 years ago

This is a proposed mapping.

type

In addition:

stage

components

Labels in this group are related to the location of the affected files:

They could have their own namespace prefix (not sure what to use though, and the names are already long enough), or just a specific color.

expertise (was included in components before)

OS (was included in components before)

versions

resolution

priority

keywords

brettcannon commented 2 years ago

The intended use of the stage labels was to always have a rough idea as to why an issue is still open without having to read the entire issue to figure it out.

ambv commented 2 years ago

In case of priorities, I assume we only really need release blocker and deferred blocker.