sagemath / trac-to-github

Script to migrate Trac tickets to GitHub issues and the Trac wiki to markdown. Input: https://trac.sagemath.org/ ➠ Intermediate: https://github.com/sagemath/trac_to_gh ➠ Output: https://github.com/sagemath/sage/issues
https://trac.sagemath.org/ticket/30363
7 stars 5 forks source link

Review components and keywords, define labels for future use #99

Open mkoeppe opened 1 year ago

mkoeppe commented 1 year ago

component_frequency.txt

keyword_frequency.txt

kwankyu commented 1 year ago

So the migration is a good time to update the components list? Would there be a corresponding (fixed) list in Github?

I don't know if we have anything to do with keywords though. They are just whatever the authors chose.

mkoeppe commented 1 year ago

Yes, our script maps components to labels of the form "component: ...."

The set of labels is fixed for the repository (can only be changed by privileged users), and it needs to be a small list because the UI is a dropdown list for all labels. (Our prefix "component: " makes sure that they are kept together in the list.)

Removing component labels later is tricky because it amounts to removing historical information. (I think but have not tested that removing a label also removes the events that indicate adding this label.)

kwankyu commented 1 year ago

Outdated!

github labels for components:

Code label color frequency
MF foundations & categories 3682
MS symbolics & calculus 3472
MM numerical 880
MA algebra 6115
ML linear algebra 2338
MG geometry & topology 3720
MC combinatorics 5477
MP graph theory 3454
MN number theory 4152
MX mathematics 1576
IL programming 3720
IP packages 7624
ID documentation 4796
IT doctest 2178
IB build 4008
II interfaces 2234
IG graphics 1820
IU user interface 3108
IR refactoring 1210
ID distribution & porting 2270
IX miscellaneous 3595
kwankyu commented 1 year ago

Outdated!

Mapping trac components to github labels:

Code trac component frequency
MC combinatorics 5477
IP packages: standard 5313
ID documentation 4740
MP graph theory 3454
IB build 3362
MA algebra 2990
IL python3 2540
MS symbolics 2390
IX misc 2350
ML linear algebra 2338
MF basic arithmetic 2232
IU notebook 2100
II interfaces 1938
IG graphics 1820
IP packages: optional 1806
MG geometry 1742
IT doctest coverage 1572
MA commutative algebra 1406
MN number theory 1392
MG algebraic geometry 1182
IR refactoring 1138
MS calculus 1082
MN elliptic curves 974
MN number fields 936
IU user interface 898
MM numerical 880
MN modular forms 850
MA group theory 828
ID porting: cygwin 782
MF categories 754
ID porting 708
MF coercion 696
ID distribution 664
IB build: configure 610
IT doctest framework 602
IX scripts 600
MA padics 568
IL cython 468
MX coding theory 442
ID porting: solaris 428
IL performance 370
IL memleak 342
MG manifolds 330
II interfaces: optional 296
IP packages: experimental 284
MC combinatorial designs 270
MG algebraic topology 266
MX linear programming 264
MX asymptotic expansions 258
MN quadratic forms 252
MX cryptography 228
MA finite rings 217
MG dynamics 200
IP packages 199
MC matroid theory 196
IX website/wiki 190
MX finite state machines 156
IX c_lib 136
ID debian-package 130
IX pickling 120
IU interact 110
IX dsage 108
MA factorization 106
MX statistics 104
ID porting: bsd 88
IR relocation 72
ID porting: aix or hp-ux 70
ID docker 64
ID translations 56
MX finance 54
MX game theory 54
IX spkg-check 46
IB pbuild 36
XX linbox 30
IP packages: huge 22
MX fractals 8
IX sage-mode 8
MX databases 8
IX fast callable 6
IT doctests 4
IX givaro 1
kwankyu commented 1 year ago

Maybe, 21 component labels are still too many?

kwankyu commented 1 year ago

Apart from mapping trac components to github labels, we may want to leave the trac component in the issue's description.

kwankyu commented 1 year ago

After we decide the maximum number of github component labels, I could request a help on sage-devel.

mkoeppe commented 1 year ago

What is the purpose/meaning of the two-letter code?

mkoeppe commented 1 year ago

After we decide the maximum number of github component labels, I could request a help on sage-devel.

This is definitely a good topic for sage-devel. A brief previous discussion on this happened in https://groups.google.com/g/sage-devel/c/2dKvRwdwVQM/m/dDmGmheFAgAJ

kwankyu commented 1 year ago

What is the purpose/meaning of the two-letter code?

That is to organize the mapping from trac components to github labels, as a temporary stepping stone.

Also it groups the github labels by:

initial letter group
M mathematics
I infrastructure
D distribution

and X for miscellaneous or anything. The second letters were chosen arbitrary.

kwankyu commented 1 year ago

The number of labels, presently 21, is adequate? Could we increase?

mkoeppe commented 1 year ago

In addition to the label name, we can also play with the label's color and text_color.

mkoeppe commented 1 year ago

The number of labels, presently 21, is adequate? Could we increase?

I'll have a version of the imported repository ready later today, then we can check how things feel on the UI side of things

jhpalmieri commented 1 year ago

Is there a limit on the length of each label? If I had an issue related to categories, I am not sure that I would realize it should belong under "Foundation" (or "Foundations"), but a longer label might capture that. Similarly, "Geometry & Topology" would be better than "Geometry", especially since topology is not geometry. "Symbolics & Calculus".

Does "Refactoring" need to be a label?

Re colors: I like the idea of using one color for the M labels, another for I, etc.

mkoeppe commented 1 year ago

Is there a limit on the length of each label?

I don't know if there's a hard limit, but probably we shouldn't make it much longer than "component: interfaces: optional" so that the display in the issue list does not become unwieldy (see the very preliminary issue list at https://34.105.185.241/sagemath/sage-20230110084128/issues)

mkoeppe commented 1 year ago

"Symbolics & Calculus"

+1 on merging these two into one

mkoeppe commented 1 year ago

Does "Refactoring" need to be a label?

Probably not

mkoeppe commented 1 year ago

Apart from mapping trac components to github labels, we may want to leave the trac component in the issue's description.

That would be easy to implement.

kwankyu commented 1 year ago

This is a bit ugly, but "c: ..." may suffice. Or if we use the colors effectively, then we may not need "component: " prefix at all.

kwankyu commented 1 year ago

Re colors: I like the idea of using one color for the M labels, another for I, etc.

Good idea.

kwankyu commented 1 year ago

Does "Refactoring" need to be a label?

Probably not

Then trac components "refactoring" and "relocation" are mapped to what? XX (miscellaneous)?

jhpalmieri commented 1 year ago

We should allow and encourage multiple components: a mathematical issue might fall between algebraic geometry and number theory, and maybe that should be be labeled with "algebra", "number theory", and "geometry & topology".

kwankyu commented 1 year ago

I agree. That is allowed by default?

kwankyu commented 1 year ago

Does "Refactoring" need to be a label?

Probably not

Then trac components "refactoring" and "relocation" are mapped to what? XX (miscellaneous)?

Though "refactoring" is not a component of the software sage, it seems an apt label for issues that restructure the codebase. For example, many issues for sage modularization may get this label. Note that the trac component "refactoring" is ranked high in the list (the trac components are ordered in frequency in trac)

kcrisman commented 1 year ago

Is there a limit on the length of each label? If I had an issue related to categories, I am not sure that I would realize it should belong under "Foundation" (or "Foundations"), but a longer label might capture that. Similarly, "Geometry & Topology" would be better than "Geometry", especially since topology is not geometry. "Symbolics & Calculus".

Similarly with graph theory - is that combinatorics or topology? Probably want that label to be explicit as to where graphs go, since by Matthias' list that had the fourth most tickets (and combinatorics the first!).

kwankyu commented 1 year ago

How about "discrete math" instead of "combinatorics"? The area of discrete math encompasses both combinatorics and graph theory, and "discrete math" is much shorter than "combinatorics & graph theory".

Personally I think graph theory is not a branch of topology though their intimate connection.

trevorkarn commented 1 year ago

Probably want that label to be explicit as to where graphs go, since by Matthias' list that had the fourth most tickets (and combinatorics the first!).

This makes a lot of sense to me and to me even justifies a graph theory category on its own. It seems like a lot to lump two of the four biggest categories together without having the labels lose meaning.

kwankyu commented 1 year ago

OK. Done.

kwankyu commented 1 year ago

By the frequency argument, I split "linear algebra" from "algebra".

kwankyu commented 1 year ago

By the same argument, would we split "standard packages" from "packages"?

kwankyu commented 1 year ago

Combined "distributions" and "porting". Moved "refactoring" to the "miscellaneous" group.

mkoeppe commented 1 year ago

By the same argument, would we split "standard packages" from "packages"?

"packages" was only used historically; it is no longer selectable as a component in Trac. Apparently at some point it was split into the "packages: standard" etc. categories.

Similarly for some of the programming-related components: "relocation" fell out of use when the old relocation mechanism was dropped; "dsage" was a short-lived project in the early days of Sage

mkoeppe commented 1 year ago

Here are some observations. Here is one of our converted issues. Note the color of the "needs review" label.

Screenshot 2023-01-11 at 5 14 45 PM

I then edited the labels in the repository: I changed "needs review" to red color, and then added the "needs review" label to the issue. Note that the new label is red; but the color of the label is unchanged for the issue events that involve this label. (This makes sense -- in the migration archive, the label text and colors are included with each label-changing issue event.)

Screenshot 2023-01-11 at 5 18 31 PM

Finally, I deleted the label "positive review" from the repository (this gives a warning popup -- "this label will be removed from all issues"). The label is gone, but it survives in all the issue events!

Screenshot 2023-01-11 at 5 19 47 PM
mkoeppe commented 1 year ago

Conclusion:

jhpalmieri commented 1 year ago

Regarding frequency of use of components, it would probably be best to know how that frequency has changed over time. A flurry of graph theory activity five years ago (for example and hypothetically — I'm making that up) should not necessarily inform our decisions about labels now.

mkoeppe commented 1 year ago

Regarding label colors, presumably we should keep the colors of the predefined GitHub labels https://github.com/sagemath/trac-to-github/labels (except perhaps for the duplicate, invalid, wontfix labels, which I think could all have the same color.)

mkoeppe commented 1 year ago

In addition to the label name, we can also play with the label's color and text_color.

It turns out that I can set color and text_color separately only for the labels that appear on labeled, unlabeled events.

kwankyu commented 1 year ago

Regarding frequency of use of components, it would probably be best to know how that frequency has changed over time.

That is not easy, or requires too much work. Not worth it now.

A flurry of graph theory activity five years ago (for example and hypothetically — I'm making that up) should not necessarily inform our decisions about labels now.

The past flurry is likely to lead more activities in future as it means we have much code to maintain.

jhpalmieri commented 1 year ago

Regarding frequency of use of components, it would probably be best to know how that frequency has changed over time.

That is not easy, or requires too much work. Not worth it now.

I agree.

A flurry of graph theory activity five years ago (for example and hypothetically — I'm making that up) should not necessarily inform our decisions about labels now.

The past flurry is likely to lead more activities in future as it means we have much code to maintain.

If it hasn't led to more activity recently, there is no evidence that it will lead to more activity down the road. That's my whole point.

kwankyu commented 1 year ago

By the same argument, would we split "standard packages" from "packages"?

"packages" was only used historically; it is no longer selectable as a component in Trac. Apparently at some point it was split into the "packages: standard" etc. categories.

Then on Github, one sweeping label "packages" would suffice or do we need to split it back to three labels "standard packages", "optional packages", "experimental packages"?

kwankyu commented 1 year ago

Adjusted the colors so that white text look better on the label background color.

kwankyu commented 1 year ago

... After the migration, we can just use the UI to mass-relabel issues that have an old label with a new label, and then delete the old label from the repository.

Then move this issue to "after migration" milestone?

mkoeppe commented 1 year ago

... After the migration, we can just use the UI to mass-relabel issues that have an old label with a new label, and then delete the old label from the repository.

Then move this issue to "after migration" milestone?

We do need to do a part of what's discussed here for the migration already, namely select label colors for:

  1. the existing component labels,
  2. the labels created for statuses "needs review", "needs work" and special milestones "pending" etc. in a way that make sense in relation to the predefined label colors (or make a decision to deviate from it).
kwankyu commented 1 year ago
  1. the existing component labels

We may use the colors determined above for github component labels and the mapping from trac components. Or perhaps just one color of the "mathematics" label for all trac components.

  1. the labels created for statuses "needs review", "needs work" and special milestones "pending" etc. in a way that make sense in relation to the predefined label colors (or make a decision to deviate from it).

There is no discussion about that here. You may start by making a table of the needed labels and colors here. For the tables above, I used the code

![](https://dummyimage.com/100x20/5319e7&text=+) \#5319e7 
![](https://dummyimage.com/100x20/9875f0&text=+) \#9875f0 
![](https://dummyimage.com/100x20/8770c2&text=+) \#8770c2

and this source https://github.com/ManageIQ/manageiq/labels may be useful.

kwankyu commented 1 year ago

Or if you prefer, you make a list of needed labels and then I may make up a preliminary table.

mkoeppe commented 1 year ago

There is no discussion about that here. You may start by making a table of the needed labels and colors here.

OK. Let's take this part of the discussion to #112.

kwankyu commented 1 year ago

Now only two groups of component labels: Mathematics labels and Infrastructure (non-math) labels, as suggested in https://github.com/sagemath/trac-to-github/issues/112#issue-1531072274.

tobiasdiez commented 1 year ago

The label names and colors can also be changed later very easily from the ui: https://github.com/sagemath/trac-to-github/labels. They can also be deleted there, but then there is no way to restore them (at least not from the ui).

mkoeppe commented 1 year ago

The label names and colors can also be changed later very easily from the ui: https://github.com/sagemath/trac-to-github/labels. They can also be deleted there, but then there is no way to restore them (at least not from the ui).

See https://github.com/sagemath/trac-to-github/issues/99#issuecomment-1379697420

tobiasdiez commented 1 year ago

Yes, I have seen that comment. But you can simply rename a label without mass-relabeling issues, and you can change the colors in the ui and these changes are applied to the already labeled issues.