iterative / dvc.org

📖 DVC website and documentation
https://dvc.org
Apache License 2.0
334 stars 392 forks source link

Improving issue labels and labeling process #2924

Closed iesahin closed 2 years ago

iesahin commented 3 years ago

It looks our labeling system needs some overhaul after the recent improvement. There seems to be no system behind the labels and we duplicate or add some irrelevant info.

image

Do we really need these? content tag has 108 open issues (out of 200 total.) Others are duplicating the info in the title.

research, question and discussion are almost identical semantically.

image

image

image

We now have dvc and dvclive.

image

image

We also have enhancement as if other issues don't enhance anything. We have 108 enhancement issues, and 78 of these also have content label.

image

And those that don't have content look they are related to content as well.

image

We also have website, eng-blog and eng-doc labels:

image

image

image

I think we can do better than this.

Some principles I have in mind

So, let me propose this:

I can provide precise definitions for these, e.g., a size: short issue is not expected to have more than 20-30 lines of change, size: grande can contain 2-3 tall or short issues as steps, etc.

When a new issue arrives, the maintainer decides who will fix this (team: *), the status of it (deferred, research, wontfix), the approximate size (size: *) and the priority if it should be done in the short term.

All issues will be required to have a team:, a status and a size label. We can also force for the priority if we add a p: sometime kind non-priority label. So when we look at the list, we'll be able to see who's supposed to work on what and in which order and how long it's supposed to take approximately.

WDYT? @jorgeorpinel @shcheklein @casperdcl @dberenbaum @daavoo

iesahin commented 3 years ago

For example, when an issue comes about the ref, we can assign it to team: core, set its status to research. When a PR is submitted, we can move it to team: docs and set the status to reviewing. If no reply/improvement comes for another 2 months, we can set the status to abandoned automatically.

It will be possible to say "group your size: short issues into cards in the project board", or "split this size: epic into 3 size: grande and write 2-3 size: short issues for each".

Also, for example, we'll need input from team: web about the size estimates for website issues. These estimates are important as well.

Labels should push for a structure, not put some extra, nice-to-have information.

shcheklein commented 3 years ago

Labels need more attention, agreed. Let me chime in regarding some specific items to clarify them (and it means we need to document those labels if it's not clear).

questions, research, discussion - they are different. Question is when users asks a questions usually (it happens rarely for us here). research - when we do research (aka spike). Internal team raising a question (this ticket is a good example - it's not research it's not question).

size I would not use Italian term - grande, etc - something less fancy, and may be we don't need that much those at all - unless we spend time grooming together as a team all the issues, size won't matter much. Except story/epic - which serves as a very specific mechanism to solve very specific problems and it's not about size even.

content - Remove content and its siblings. Idea here is to designate certain part of the docs. E.g. dvclive, get-started. How would you differentiate those? may be we can find a better name? section: dvclive? Remove the parent content?

status - sounds good. status: research and research are different though potentially. Or we can introduce spike, or research is required.

team - there are ways to assign teams if we want to that. We have only two major systems - content and website. We can keep two set of labels - content/get-started, website/engine. (generic website also helps - just to do a quick slice)

priority - we are in a good shape I think.

Color shades - we can actually use the same color for groups, no shades for consistency and simplicity.

iesahin commented 3 years ago

questions, research, discussion - they are different. Question is when users asks a questions usually (it happens rarely for us here). research - when we do research (aka spike). Internal team raising a question (this ticket is a good example - it's not research it's not question).

So question can go, research can be status: researching. discussion is what we do always on these tickets.

size I would not use Italian term - grande, etc - something less fancy, and may be we don't need that much those at all - unless we spend time grooming together as a team all the issues, size won't matter much. Except story/epic - which serves as a very specific mechanism to solve very specific problems and it's not about size even.

I believe deciding on the size is more important than any other label. Putting that label will force an estimate, is it a 20-line change on a file or a complete rewrite of a section? Deciding on this label is progress by itself and will probably lead to shorter discussions.

I used Starbucks coffee sizes but I can replace the terms with size: s (10-20 lines changes) to size: xxl and size: epic. Epic itself looks to be used to mean something that can affect 3 pages, 30 pages or 300 pages. It looks like "we know it's big but don't know how big it is" kind of label.

I like size instead of effort, because the former has a clear meaning, "what is the approximate scope of changes (lines of files)?" Effort is relative, I could see something as high effort, you can see as low, etc.

content - Remove content and its siblings. Idea here is to designate certain part of the docs. E.g. dvclive, get-started. How would you differentiate those? may be we can find a better name? section: dvclive? Remove the parent content?

We were and still are designating these in the title with the keywords: start: ... or dvclive:.... Labels are UI-dependent, they may show up sometimes, in some places and not in others. Section information is an inherent part of the task.

We don't separate our work by sections, I don't only work on UG or GS, Jorge doesn't only work on REF or UC, I could write blog posts or fix dvclive docs. I never used that kind of filtering and we never needed such a facility in the meetings.

When we have some label like content which can be applied to about 90% of the issues, it doesn't make much difference anyway. It might mean to search -label:content but we had other labels for it.

team - there are ways to assign teams if we want to that. We have only two major systems - content and website. We can keep two set of labels - content/get-started, website/engine. (generic website also helps - just to do a quick slice)

Yes, we can create Github teams and assign the issues to them etc. but these bring extra noise to the participants and I'd like a softer grouping. This is actually what we want to achieve by the current content and website labels. Roger and Julie should be able to see their issues easily, we should be pinging the core team from time to time about team: core issues, etc. Instead of grouping the issues by the section in the docs, we should group them by the responsible party.

status - sounds good. status: research and research are different though potentially. Or we can introduce spike, or research is required.

What could be the difference between research and status: researching? In one the assigned person is doing some research about the issue, and in the other, the assigned person is researching the issue. In both cases the research results are in the issue. We can use gerunds researching instead of research if it looks like we're making progress. :)

status: spike can be introduced, along with any other required. We need to work on clear workflows between these state changes.

Our issue decisions can become label decisions, we can dedicate meetings to list these issues and specifically talk about their labels. It will probably be more effective in planning than the current approach we have.

jorgeorpinel commented 3 years ago

Do we really need these? content tag has 108 open issues

That's because all the other content-x labels are new and whoever created them didn't migrate the existing content issues. I'm slowly reviewing all the issues from the oldest to the newest though and reapplying the more specific content labels.

questions, research, discussion - they are different

We now have dvc and dvclive.

Also studio, because those are the products we have docs for here. We could remove dvc and assume everything is DVC unless otherwise indicated.

We also have enhancement as if other issues don't enhance anything.

enhancement is contrasted to bug or feature-request (or content* for new/missing content). We could also group these with a prefix (& color) to make it clear they're related.

We also have website, eng-blog and eng-doc labels

eng* labels are clear IMO. website, design, and ui/ux not so much. Should we merge them?

we can introduce spike

We don't spike a lot but I'm OK with that. And group it with research

we can actually use the same color for groups

We're already doing that at least partially.


General thoughts:

I agree we should try to be consistent in labeling but that's not a problem with the system itself. It has been evolving and older issues didn't get the newer labels. Again, I'm slowly reviewing them.

Labels for aspects: status, priority, size, team

Not sure labels are good for status or teams. The status is clear: no priority, planned, has linked PR, closed. Teams are also clear so far: it's either engineering/website, or content. We do have priority ones but tend to assign only to tickets planning soon (otherwise an issue is considered low-priority by default). Size would be nice but we I don't think we have that capacity for grooming ATM, we can just rescope tickets when planning them for a sprint for now (not a labeling question).

Each issue should have 1 label from each of the aspects.

Not sure we want strict rules.

jorgeorpinel commented 3 years ago

status - sounds good

@iesahin @shcheklein I see status and team labels have already been created. I do not agree we need these and I think we should make sure to discuss as a team before changing or creating labels. Now https://github.com/iterative/dvc.org/labels?page=2&sort=name-asc is even longer and I have no way to know what else has changed, has someone deleted labels for example? It's unclear.

shcheklein commented 3 years ago

Naming is hard :) Let's not please argue (initially at lease) about specific names (status: awaits dvc vs awaiting-dvc) - or team: websites vs websites. It's very secondary to it. Let's logically first agree on the high level groups that we need.

E.g. we want sizes, statuses, priorities, part of content, parts of the website?

Do we agree on the list of groups?

casperdcl commented 3 years ago

meta-comment: the new GH project boards seem to solve all the problems discussed above.

the only downside is the project boards won't be publicly visible (unlike labels)... but then again, do we really need project management stuff to be publicly visible?

shcheklein commented 3 years ago

but then again, do we really need project management stuff to be publicly visible?

yes, in case of open-source we want to keep them visible -- I hope boards will be made public soon, though?

I feel a bit uneasy to have a list of issues completely stripped from any attributes. Also, with a board it means I now have to go to the board to slice and dice. Not saying that board doesn't have its value, feeling is it's not a replacement for basic labels, it's an expansion on top to do product management/project management on different level.

good question though, curious about other opinions on this, or what even GH folks have in mind with these two different systems.

casperdcl commented 3 years ago

yes we could do public boards, and also we can use some sort of an API action to keep labels <> board in sync. They actually recommend this sort of automation and also mention having "a single source of truth" in their best practices guide.

iesahin commented 3 years ago

I see status and team labels have already been created.

I'm testing some of the ideas presented here on a smaller scale. I rename some, delete those without any members and add some but these all are reversible. @jorgeorpinel e.g. I didn't delete enhancement because it has members but research became status:researching. (BTW, after adding these I think : is not a good delimiter, / might be better)

Each issue will be required to have size, team and status labels. priority issues will be decided by the team.

These 4 aspects/traits of issues are what we should discuss in the planning meetings. For all issues with team:web label, we'll ask @julieg18 which two of them could be p1, which five of them are p2, etc. We currently can't track which issues need attention from the core team, @efiop joins some of the meetings and unless we ask something to him specifically, he just listens. We don't know the status of blog posts by the DevRel team, and they don't have a way to say "we are stuck on this" before the meeting.

Theoretically, these all could be done with the board but setting labels seems an easier way to signal the meta-information about the issues.

Another thing about the boards is the issues that we don't put there are away from our eyesight completely. We don't know what we don't know until we decide to look into the backlog. You know that I'm working on more than one issue, and ask me to focus because I'm putting all those to the board. If I'd not put them, you'd never know if I have focus issues :)

We need to educate practically everyone in Iterative to use the board, and the meaning of their columns if they want our attention. This will never happen.

I managed software and non-software projects using Trello. I wrote all sorts of automation with its Python API. I'm not alien to columns and Kanban, but in our case, it seems a bit manual busywork. I'd want a one and only one definitive place to track the work and GH boards can't be that. We can sort the issues, filter them, change labels to specify all meta-info, and I don't why we don't do this instead of the board.

iesahin commented 3 years ago
  • can switch between kanban <-> table

  • adds separate priority column

    • making pN labels redundant
    • can use boards API to e.g. convert all pN labels to board priorities

I think this is not good. They may have to do this because p1 etc have no special order, but if we can't see this as easily as the issue labels, IMHO it's not worth it.

  • adds target date column

I wouldn't use this.

  • adds customisable columns (which team, notes, milestone, reviewers, etc)

We could use these.

  • adds customisable grouping (kanban only groups by status)

and this

  • can have save different views (e.g. group by priority, group by assignee, etc.)

and this.

As a matter of principle, I prefer older methods and tools to newly conceived ones. Also, I think the most important question we have is the definitive place everyone tracks the work. I'd want a single bookmark that I can track all my issues, and other work.

Another point is education. The simpler the tool, the shorter it takes onboarding new people.

iesahin commented 3 years ago

yes, in case of open-source we want to keep them visible -- I hope boards will be made public soon, though?

I'd like everything to be made public as much as possible. A 16 y.o. asking a question might show a flaw in my reasoning and having more in the public invites more people to discussion.

I feel a bit uneasy to have a list of issues completely stripped from any attributes. Also, with a board it means I now have to go to the board to slice and dice. Not saying that board doesn't have its value, feeling is it's not a replacement for basic labels, it's an expansion on top to do product management/project management on different level.

I feel the same.

good question though, curious about other opinions on this, or what even GH folks have in mind with these two different systems.

I think labels/tags are almost universal now. Boards etc. are too software/technology dependent, and I feel we should stick to universal tools as much as possible.

shcheklein commented 3 years ago

Emre, thanks.

It's a great summary. A few comments/suggestions.

We currently don't do even crude estimates

we do have some proxy - good first issue, epic/story. I'm fine to try to replace both with size/epic, size/xxs. I would though keep good first issue - external contributors and some automation tools are looking for those

team is for tracking who is responsible in a soft manner

instead of this, for me it would be better to have area/website, area/docs and then clarifications (if possible): website/engine, content/cmrf. Teams could decide which one take what ticket - assignments work for this better, not labels.

status

whole new thing (besides a few statuses that work already - awaiting DVC core merge, awaiting response, triage - hard to tell how far/good it will become. Happy to try.


Do we want to introduce type/research, type/discussion, type/question, type/bug, type/story (or epic)? It feels like it's a different dimension as well compared to status?

iesahin commented 3 years ago

instead of this, for me it would be better to have area/website, area/docs and then clarifications (if possible): website/engine, content/cmrf. Teams could decide which one take what ticket - assignments work for this better, not labels.

team/docs, team/web labels are supposed to be updated when an input is required from the other team, e.g. when devrel team finishes work on an issue, they can set the label to team/docs for review. area/... seems rather static. I'm looking for a way to signal whose attention is needed on an issue.

Assigning the persons feels a bit hierarchical in GH. In theory, that's the same thing, but there is no "team assignment" AFAIK in GH and assigning persons to do something is different than labeling an issue, "this issue requires attention from team/x".

We may rename it to something like re/docs or re/devrel for regards, requires or responsible, but it looks too general and not immediately understandable.

iesahin commented 3 years ago

Do we want to introduce type/research, type/discussion, type/question, type/bug, type/story (or epic)? It feels like it's a different dimension as well compared to status?

Will these change anything in processing the issue?

iesahin commented 3 years ago

The basic question in labels is asking "will these labels modify (or help) processing the issue?" Adding content-start label to an issue with the title start: Add another section doesn't change anything processing it, while team/web label says "this issue needs some work regarding the website/gatsby/etc, so I may skip (or deal) for the time being."

iesahin commented 3 years ago

I've added an initial page for labels: https://github.com/iterative/dvc.org/wiki/Labels-(v2)

shcheklein commented 3 years ago

I'm looking for a way to signal whose attention is needed on an issue.

The purpose was not clear to be honest. If it's about waiting someone - let's make it "waiting/dvc". Also, assignments work - you could assign iterative/dvc for example.

shcheklein commented 3 years ago

area/... seems rather static.

they are static. We need them to slice issues quickly. It helps a lot (e.g. find everything p1 on doc engine).

shcheklein commented 3 years ago

Will these change anything in processing the issue?

again, primarily static thing to distinguish bugs and new things for example. It help to prioritize and navigate. E.g. I want to skip questions, discussions, etc - I want to see p1s - bugs or new features on docs engine only.

shcheklein commented 3 years ago

while team/web label says "this issue needs some work regarding the website/gatsby/etc

to me this label says that it's assigned (?) to a specific team (not that it's web or something)

shcheklein commented 3 years ago

The basic question in labels is asking "will these labels modify (or help) processing the issue?

Okay, I see that we have two different goals - navigation and process.

Labels are important for me for navigation, structuring things and being able quickly slice and dice and find the appropriate backlog of things.

Process is also important- that's what status is for. And so far we have had only a couple of labels for this. There are other mechanisms - boards, milestones, assignments that should be helping us navigate the process primarily. + some status label should be enough to my mind.

iesahin commented 3 years ago

Also, assignments work - you could assign iterative/dvc for example.

Am I missing something, or are we not talking about the same kind of assignments here?

image

iesahin commented 3 years ago

Labels are important for me for navigation, structuring things and being able quickly slice and dice and find the appropriate backlog of things.

If we need labels for navigation, that's fine, but we should automate those.

I've added content/start to all issues with titles having start:

for iid in $(gh issue list --state all --limit 1000 | rg 'start:' | cut -f 1) ; do gh issue edit ${iid} --add-label 'content/start' ; done

Manually setting these labels are error-prone and duplication of work. We can define automated rules for issues like this and I can write a few scripts and cron jobs for setting labels instead of trying to fix them for each new issue.

iesahin commented 3 years ago

Added some navigation labels to the proposal. We can increase the number of these as soon as there are rules behind them.

https://github.com/iterative/dvc.org/wiki/Labels-(v2)#automated-navigation-labels

iesahin commented 3 years ago

I thought to write a cron job but it looks there is already a GH for labeling automation:

https://github.com/github/issue-labeler

iesahin commented 3 years ago

I reviewed some and it looks this one is better for automated issue labeling:

https://github.com/marketplace/actions/jbang-issue-labeler

shcheklein commented 3 years ago

Am I missing something, or are we not talking about the same kind of assignments here?

Yes, my bad. PRs could be assigned to teams. And it makes sense to be honest. I haven't see the workflow when you assign a ticket to a team, then to a person. Still not sure if that is all valuable. I would still prefer label to specify the area of the product. Teams could decide who takes it (teams should know what primary area they want to follow), when people from a specific team take it they assign it to them. Or maybe I still missing the purpose of team assignments :)

If we need labels for navigation, that's fine, but we should automate those.

probably not possible to automate this (unless we ask to put prefixes always - which you would want to automate also). Prefixes are optional at the moment. And it's hard to expect everyone to put them.

Also we would need to introduce complex prefixes like content/start: improve get started experience.

Manually setting these labels are error-prone and duplication of work.

it's duplication if we expect prefixes, see above ^^.

I thought to write a cron job but it looks there is already a GH for labeling automation:

I'm fine to try that list. Feels like too many things to manage though. It will require effort and a lot of explanation to manage statuses for example (except some very obvious that signal that we are blocked for merge or something).

Teams - semantics and the purpose is still not clear to me.

You may want to add additional labels like bug or feature-request. These are not required and don't always have clear semantics, (what's a bug in a documentation context?), but can be used to signal some specific attributes of issues.

To me bug is very clear semantics. Something factually wrong in the docs, engine is broken (doesn't load a page after the release), etc.

feature-request, etc

Is'a type of issue group of labels. Again, they help navigate, triage, and prioritize things.

Could we introduce - type/bug, type/question, type/discussion, type/feature, type/epic?

iesahin commented 3 years ago

I haven't see the workflow when you assign a ticket to a team, then to a person.

Assigning to teams feels easier to me. I don't know the priorities and availability of a person, teams are better even if they have a single member. It's "dynamic dispatch" instead of "static dispatch", which looks softer in organizational terms. I don't feel it's right to assign a ticket to Dmitry, but I can assign by labeling it team/exec and you, Dave, Dmitry (or who feels themself executive) can look into it when you have time. This is true for most of the other people, e.g., I don't know who is looking into docs related issues in the Studio team, I can ask Tapa about it, but she'll probably want to assign it to another person. This usually takes much longer than adding a label team/studio to an issue.

It doesn't have to be team/*, it can be something (1) dynamic (unlike area/*), (2) mutually exclusive (so that only one from these kind will be found in an issue.) who/exec is possible, we can have who/*, what/* (instead of type/*), where/* (instead of status/*) howbig/* (instead of size/) for uniformity, but the current prefixes seem easier to understand to me.

iesahin commented 3 years ago

probably not possible to automate this (unless we ask to put prefixes always - which you would want to automate also). Prefixes are optional at the moment. And it's hard to expect everyone to put them.

Also we would need to introduce complex prefixes like content/start: improve get started experience.

The current prefixes are easier to type and used more frequently than labels. We don't need to duplicate content/start either, start: is Ok, the label and the prefix don't have to be identical. if issue.title.startswith("start:"): issue.add_label("content/start")

To me bug is very clear semantics. Something factually wrong in the docs, engine is broken (doesn't load a page after the release), etc.

These are different things. A bug is related to software, or a broken engine, or some piece of code that's not doing its intended purpose. If Gatsby or Heroku hiccups due to some failure, or we can't show 3rd level children of a document due to some CSS rule, that's a bug.

For the content, it's difficult to identify what a bug is. It may be a typo, an incorrect example, a feature that's not available, a feature that was available but not anymore, a requirement that was once required but not anymore, a claim unfulfilled... Had we written "DVC washes your dishes on Thursdays", that's a factual error but is it really a bug? I really can't decide on this.

iesahin commented 3 years ago

I thought to write a cron job but it looks there is already a GH for labeling automation:

I'm fine to try that list. Feels like too many things to manage though. It will require effort and a lot of explanation to manage statuses for example (except some very obvious that signal that we are blocked for merge or something).

It's easier to automate as much as possible than to manage labels manually. Going with manual content/* labels will result in a broken system that we discuss every 3rd retro meeting. I would like to solve this issue if you rely on labels for navigation. Anyone can continue to add these labels manually as well. It's not forbidden. :)

It will require effort to manage size and status, yes, but this effort is well spent for planning. I'm trying to come up with a system that replaces the indefinite idea of "planning" with definite actions of adding status/, size/ and team/ labels to the issues. It will be much easier to plan ahead, set the next steps etc. with these.

iesahin commented 3 years ago

Is'a type of issue group of labels. Again, they help navigate, triage, and prioritize things.

Could we introduce - type/bug, type/question, type/discussion, type/feature, type/epic?

We can add as many as you see fit. Frankly, I'm more interested in resolving and closing these issues, rather than identifying their type, but if these look more appropriate to you, that's ok.

iesahin commented 3 years ago

Added more content/ labels according to the rules in https://github.com/iterative/dvc.org/wiki/Labels-(v2)#automated-navigation-labels

We have around 100 non-classified open issues.

shcheklein commented 3 years ago

I'm trying to come up with a system that replaces the indefinite idea of "planning" with definite actions of adding status/, size/ and team/ labels to the issues

That where you should have started explaining the system of labels you have in mind :) Could you please describe how do exactly you see the problem of the indefinite idea of "planning"?

I think we won't have time for this unless we allocate an extra hour for this. I mean no time as a group to go through tickets, discuss complexities, assign responsible people, etc.

I would prefer to trust you and the team that you understand the roadmap (high level priorities) and you pick next issues to work on, you split things accordingly, you manage labels accordingly. And everyone does their best to keep new issues groomed.

The current prefixes are easier to type and used more frequently than labels. We don't need to duplicate content/start either, start: is Ok, the label and the prefix don't have to be identical. if issue.title.startswith("start:"): issue.add_label("content/start")

it might work in simple case it will break in some a bit realistic scenarios.

Also, we agree that it's not an automation. It's the same manual work + some automation - that we'll need to document, explain to everyone, maintain as we change things. This is a non trivial effort.

I would suggest before we even discuss automation, let's come up with groups, agree on them, try them manually.

The current prefixes are easier to type and used more frequently than labels.

this is subjective and requires some unconventional knowledge that is specific to us. This is suboptimal. We are moving complexity from one place to another. The best system would be a system where external people who are familiar with GH would understand what's going on here.

For the content, it's difficult to identify what a bug is.

Yes, but we have a large group of issue related to the infra - Heroku, engine, etc. For documents - a lot of tickets are very easy to identify as bugs. Typo is an extreme example, wrong name, etc. No need to label all of them as bugs. If in doubts don't put it :)

Assigning to teams feels easier to me.

It's fine - the problem that it's not an assignment (it's something that we assume people will understand that this is assignment - it will be hard to explain). Also, not exactly clear why someone even should be assigning anything even to a team? Team should know their priorities, what ares are their main focus and pick tickets from the backlog.

iesahin commented 2 years ago

After discussing with @jorgeorpinel, I believe we can postpone this discussion to the next year.

jorgeorpinel commented 2 years ago

Sorry, I didn't have the bandwidth to keep up with this and I see we've adopted something similar to what the dvc repo has (I'm OK with that + may improve collaboration with the core team). Here are some comments for what it's worth.


Labels are important for me for navigation, structuring things and being able quickly slice and dice and find the appropriate backlog of things.

This is what I use them for the most. And even if we could automate that somehow (@iesahin), triage is an important part of the process too.

Process is also important- that's what status is for.

Well, navigation is also part of the (planning) process. Statuses are for the execution part of the process I guess. TBH I don't use status labels much for issues, since execution is on our boards.

it's duplication if we expect prefixes, see above ^^.

@iesahin and I discussed this and we think it's fine to have this duplication because it still makes more readable issue/PR titles + you can't see labels on advanced views (e.g. regex issue title search) or in the VSCode GH extension. So we can keep prefixes and other naming conventions (on top of good labels) for internal convenience.

Let's just make sure they're consistent for now... (I'll review the wiki pages ASAP ⌛)