Open mkoeppe opened 1 year ago
So the migration is a good time to update the components list? Would there be a corresponding (fixed) list in Github?
I don't know if we have anything to do with keywords though. They are just whatever the authors chose.
Yes, our script maps components to labels of the form "component: ...."
The set of labels is fixed for the repository (can only be changed by privileged users), and it needs to be a small list because the UI is a dropdown list for all labels. (Our prefix "component: " makes sure that they are kept together in the list.)
Removing component labels later is tricky because it amounts to removing historical information. (I think but have not tested that removing a label also removes the events that indicate adding this label.)
Outdated!
github labels for components:
Code | label | color | frequency |
---|---|---|---|
MF | foundations & categories | 3682 | |
MS | symbolics & calculus | 3472 | |
MM | numerical | 880 | |
MA | algebra | 6115 | |
ML | linear algebra | 2338 | |
MG | geometry & topology | 3720 | |
MC | combinatorics | 5477 | |
MP | graph theory | 3454 | |
MN | number theory | 4152 | |
MX | mathematics | 1576 | |
IL | programming | 3720 | |
IP | packages | 7624 | |
ID | documentation | 4796 | |
IT | doctest | 2178 | |
IB | build | 4008 | |
II | interfaces | 2234 | |
IG | graphics | 1820 | |
IU | user interface | 3108 | |
IR | refactoring | 1210 | |
ID | distribution & porting | 2270 | |
IX | miscellaneous | 3595 |
Outdated!
Mapping trac components to github labels:
Code | trac component | frequency |
---|---|---|
MC | combinatorics | 5477 |
IP | packages: standard | 5313 |
ID | documentation | 4740 |
MP | graph theory | 3454 |
IB | build | 3362 |
MA | algebra | 2990 |
IL | python3 | 2540 |
MS | symbolics | 2390 |
IX | misc | 2350 |
ML | linear algebra | 2338 |
MF | basic arithmetic | 2232 |
IU | notebook | 2100 |
II | interfaces | 1938 |
IG | graphics | 1820 |
IP | packages: optional | 1806 |
MG | geometry | 1742 |
IT | doctest coverage | 1572 |
MA | commutative algebra | 1406 |
MN | number theory | 1392 |
MG | algebraic geometry | 1182 |
IR | refactoring | 1138 |
MS | calculus | 1082 |
MN | elliptic curves | 974 |
MN | number fields | 936 |
IU | user interface | 898 |
MM | numerical | 880 |
MN | modular forms | 850 |
MA | group theory | 828 |
ID | porting: cygwin | 782 |
MF | categories | 754 |
ID | porting | 708 |
MF | coercion | 696 |
ID | distribution | 664 |
IB | build: configure | 610 |
IT | doctest framework | 602 |
IX | scripts | 600 |
MA | padics | 568 |
IL | cython | 468 |
MX | coding theory | 442 |
ID | porting: solaris | 428 |
IL | performance | 370 |
IL | memleak | 342 |
MG | manifolds | 330 |
II | interfaces: optional | 296 |
IP | packages: experimental | 284 |
MC | combinatorial designs | 270 |
MG | algebraic topology | 266 |
MX | linear programming | 264 |
MX | asymptotic expansions | 258 |
MN | quadratic forms | 252 |
MX | cryptography | 228 |
MA | finite rings | 217 |
MG | dynamics | 200 |
IP | packages | 199 |
MC | matroid theory | 196 |
IX | website/wiki | 190 |
MX | finite state machines | 156 |
IX | c_lib | 136 |
ID | debian-package | 130 |
IX | pickling | 120 |
IU | interact | 110 |
IX | dsage | 108 |
MA | factorization | 106 |
MX | statistics | 104 |
ID | porting: bsd | 88 |
IR | relocation | 72 |
ID | porting: aix or hp-ux | 70 |
ID | docker | 64 |
ID | translations | 56 |
MX | finance | 54 |
MX | game theory | 54 |
IX | spkg-check | 46 |
IB | pbuild | 36 |
XX | linbox | 30 |
IP | packages: huge | 22 |
MX | fractals | 8 |
IX | sage-mode | 8 |
MX | databases | 8 |
IX | fast callable | 6 |
IT | doctests | 4 |
IX | givaro | 1 |
Maybe, 21 component labels are still too many?
Apart from mapping trac components to github labels, we may want to leave the trac component in the issue's description.
After we decide the maximum number of github component labels, I could request a help on sage-devel.
What is the purpose/meaning of the two-letter code?
After we decide the maximum number of github component labels, I could request a help on sage-devel.
This is definitely a good topic for sage-devel. A brief previous discussion on this happened in https://groups.google.com/g/sage-devel/c/2dKvRwdwVQM/m/dDmGmheFAgAJ
What is the purpose/meaning of the two-letter code?
That is to organize the mapping from trac components to github labels, as a temporary stepping stone.
Also it groups the github labels by:
initial letter | group |
---|---|
M | mathematics |
I | infrastructure |
D | distribution |
and X for miscellaneous or anything. The second letters were chosen arbitrary.
The number of labels, presently 21, is adequate? Could we increase?
In addition to the label name, we can also play with the label's color
and text_color
.
The number of labels, presently 21, is adequate? Could we increase?
I'll have a version of the imported repository ready later today, then we can check how things feel on the UI side of things
Is there a limit on the length of each label? If I had an issue related to categories, I am not sure that I would realize it should belong under "Foundation" (or "Foundations"), but a longer label might capture that. Similarly, "Geometry & Topology" would be better than "Geometry", especially since topology is not geometry. "Symbolics & Calculus".
Does "Refactoring" need to be a label?
Re colors: I like the idea of using one color for the M labels, another for I, etc.
Is there a limit on the length of each label?
I don't know if there's a hard limit, but probably we shouldn't make it much longer than "component: interfaces: optional" so that the display in the issue list does not become unwieldy (see the very preliminary issue list at https://34.105.185.241/sagemath/sage-20230110084128/issues)
"Symbolics & Calculus"
+1 on merging these two into one
Does "Refactoring" need to be a label?
Probably not
Apart from mapping trac components to github labels, we may want to leave the trac component in the issue's description.
That would be easy to implement.
This is a bit ugly, but "c: ..." may suffice. Or if we use the colors effectively, then we may not need "component: " prefix at all.
Re colors: I like the idea of using one color for the M labels, another for I, etc.
Good idea.
Does "Refactoring" need to be a label?
Probably not
Then trac components "refactoring" and "relocation" are mapped to what? XX (miscellaneous)?
We should allow and encourage multiple components: a mathematical issue might fall between algebraic geometry and number theory, and maybe that should be be labeled with "algebra", "number theory", and "geometry & topology".
I agree. That is allowed by default?
Does "Refactoring" need to be a label?
Probably not
Then trac components "refactoring" and "relocation" are mapped to what? XX (miscellaneous)?
Though "refactoring" is not a component of the software sage, it seems an apt label for issues that restructure the codebase. For example, many issues for sage modularization may get this label. Note that the trac component "refactoring" is ranked high in the list (the trac components are ordered in frequency in trac)
Is there a limit on the length of each label? If I had an issue related to categories, I am not sure that I would realize it should belong under "Foundation" (or "Foundations"), but a longer label might capture that. Similarly, "Geometry & Topology" would be better than "Geometry", especially since topology is not geometry. "Symbolics & Calculus".
Similarly with graph theory - is that combinatorics or topology? Probably want that label to be explicit as to where graphs go, since by Matthias' list that had the fourth most tickets (and combinatorics the first!).
How about "discrete math" instead of "combinatorics"? The area of discrete math encompasses both combinatorics and graph theory, and "discrete math" is much shorter than "combinatorics & graph theory".
Personally I think graph theory is not a branch of topology though their intimate connection.
Probably want that label to be explicit as to where graphs go, since by Matthias' list that had the fourth most tickets (and combinatorics the first!).
This makes a lot of sense to me and to me even justifies a graph theory category on its own. It seems like a lot to lump two of the four biggest categories together without having the labels lose meaning.
OK. Done.
By the frequency argument, I split "linear algebra" from "algebra".
By the same argument, would we split "standard packages" from "packages"?
Combined "distributions" and "porting". Moved "refactoring" to the "miscellaneous" group.
By the same argument, would we split "standard packages" from "packages"?
"packages" was only used historically; it is no longer selectable as a component in Trac. Apparently at some point it was split into the "packages: standard" etc. categories.
Similarly for some of the programming-related components: "relocation" fell out of use when the old relocation mechanism was dropped; "dsage" was a short-lived project in the early days of Sage
Here are some observations. Here is one of our converted issues. Note the color of the "needs review" label.
I then edited the labels in the repository: I changed "needs review" to red color, and then added the "needs review" label to the issue. Note that the new label is red; but the color of the label is unchanged for the issue events that involve this label. (This makes sense -- in the migration archive, the label text and colors are included with each label-changing issue event.)
Finally, I deleted the label "positive review" from the repository (this gives a warning popup -- "this label will be removed from all issues"). The label is gone, but it survives in all the issue events!
Conclusion:
unlabeled
issue events.) The old label is preserved in any issue events. #108 will make sure that initial labels of a ticket are not lost.Regarding frequency of use of components, it would probably be best to know how that frequency has changed over time. A flurry of graph theory activity five years ago (for example and hypothetically — I'm making that up) should not necessarily inform our decisions about labels now.
Regarding label colors, presumably we should keep the colors of the predefined GitHub labels https://github.com/sagemath/trac-to-github/labels (except perhaps for the duplicate
, invalid
, wontfix
labels, which I think could all have the same color.)
In addition to the label name, we can also play with the label's
color
andtext_color
.
It turns out that I can set color
and text_color
separately only for the labels that appear on labeled
, unlabeled
events.
Regarding frequency of use of components, it would probably be best to know how that frequency has changed over time.
That is not easy, or requires too much work. Not worth it now.
A flurry of graph theory activity five years ago (for example and hypothetically — I'm making that up) should not necessarily inform our decisions about labels now.
The past flurry is likely to lead more activities in future as it means we have much code to maintain.
Regarding frequency of use of components, it would probably be best to know how that frequency has changed over time.
That is not easy, or requires too much work. Not worth it now.
I agree.
A flurry of graph theory activity five years ago (for example and hypothetically — I'm making that up) should not necessarily inform our decisions about labels now.
The past flurry is likely to lead more activities in future as it means we have much code to maintain.
If it hasn't led to more activity recently, there is no evidence that it will lead to more activity down the road. That's my whole point.
By the same argument, would we split "standard packages" from "packages"?
"packages" was only used historically; it is no longer selectable as a component in Trac. Apparently at some point it was split into the "packages: standard" etc. categories.
Then on Github, one sweeping label "packages" would suffice or do we need to split it back to three labels "standard packages", "optional packages", "experimental packages"?
Adjusted the colors so that white text look better on the label background color.
... After the migration, we can just use the UI to mass-relabel issues that have an old label with a new label, and then delete the old label from the repository.
Then move this issue to "after migration" milestone?
... After the migration, we can just use the UI to mass-relabel issues that have an old label with a new label, and then delete the old label from the repository.
Then move this issue to "after migration" milestone?
We do need to do a part of what's discussed here for the migration already, namely select label colors for:
- the existing component labels
We may use the colors determined above for github component labels and the mapping from trac components. Or perhaps just one color of the "mathematics" label for all trac components.
- the labels created for statuses "needs review", "needs work" and special milestones "pending" etc. in a way that make sense in relation to the predefined label colors (or make a decision to deviate from it).
There is no discussion about that here. You may start by making a table of the needed labels and colors here. For the tables above, I used the code
![](https://dummyimage.com/100x20/5319e7&text=+) \#5319e7
![](https://dummyimage.com/100x20/9875f0&text=+) \#9875f0
![](https://dummyimage.com/100x20/8770c2&text=+) \#8770c2
and this source https://github.com/ManageIQ/manageiq/labels may be useful.
Or if you prefer, you make a list of needed labels and then I may make up a preliminary table.
There is no discussion about that here. You may start by making a table of the needed labels and colors here.
OK. Let's take this part of the discussion to #112.
Now only two groups of component labels: Mathematics labels and Infrastructure (non-math) labels, as suggested in https://github.com/sagemath/trac-to-github/issues/112#issue-1531072274.
The label names and colors can also be changed later very easily from the ui: https://github.com/sagemath/trac-to-github/labels. They can also be deleted there, but then there is no way to restore them (at least not from the ui).
The label names and colors can also be changed later very easily from the ui: https://github.com/sagemath/trac-to-github/labels. They can also be deleted there, but then there is no way to restore them (at least not from the ui).
See https://github.com/sagemath/trac-to-github/issues/99#issuecomment-1379697420
Yes, I have seen that comment. But you can simply rename a label without mass-relabeling issues, and you can change the colors in the ui and these changes are applied to the already labeled issues.
component_frequency.txt
keyword_frequency.txt