Control label and tag creation

nathanielrindlaub commented 2 years ago

"Controlled labels" - limit the labels you can apply so users/volunteers can't apply random labels (for public facing UI of app). Label creation UX similar to github's. Consider also implementing label hierarchy/priority levels ("skunk" > "animal"). Use Radix-color scales as color options.

nathanielrindlaub commented 1 year ago

I think we want to disable the ability for even Project Managers to create new labels on the fly w/ the Category Selector (as is currently the case for everyone), because (a) even managers can mistype things & add bad labels and (b) we want to provide some level of color-control.

I also think that we should always give all labelers the option to label something as "unknown".

The main outstanding question here is whether we want to include a hierarchy.

Pros: hierarchical labels might be useful for data analysis and rolling up child labels into parent labels for classifier training purposes.

Cons: not sure how helpful it is beyond that? Adds another level of cognitive load / decision making on the labeler? Might be a bit annoying to implement?

nathanielrindlaub commented 1 year ago

Also worth thinking through implications for ML-generated labels: it would be a bit of an ask of Project Managers to anticipate all the classes/labels their future models might generate and create them ahead of time (and structure them hierarchically if we're doing that). So do labels produced by ML models automatically get shoehorned into projects' "available labels" ?

nathanielrindlaub commented 1 year ago

I'm leaning towards keeping labels flat (non-hierarchical) for now. It would add too much complexity with not enough obvious benefit.

postfalk commented 1 year ago

Still like the idea to apply filters to certain groups. BUT I guess that is a feature that needs to have a place in the roadmap and should not be implemented ad hoc and hasty.

nathanielrindlaub commented 1 year ago

@postfalk what do you mean by "applying filters to certain groups"?

postfalk commented 1 year ago

Sorry for being unclear, I mean limiting the labels a volunteer reviewer can apply while reviewing.

nathanielrindlaub commented 1 year ago

Oh yes, 100%, that's what this issue is about. I agree I think it's important to implement before opening the door to wider audiences and I've added it to the roadmap.

nathanielrindlaub commented 1 year ago

youtube videos on how TrapTagger does this: https://www.youtube.com/watch?v=gw7mCAH7ThU

nathanielrindlaub commented 1 year ago

@oliverroick had a good question - should project managers be able to delete labels, and if so, what should happen (do we go through all images in the DB and remove those labels?). Similarly, do we want to allow project managers to rename labels and kick off a label renaming process across all images in the DB?

nathanielrindlaub commented 1 year ago

Updated requirements

I've refined my thoughts on the requirements for this feature after thinking through image-level tags a bit more. The requirements are as follows:

1. Create a UI so that Project Managers can create object-level "Labels"

Object-level labels, as described here, are the labels that users (and ML models) can apply to objects in images. Their schema will remain the same, but we'll need to create a new collection or some representation of them in the DB (perhaps something like project.labels) to store the available labels and their properties. Labels will need to have the following properties and have a UI for setting/editing them:

name
color
editable (?)

I like Github's UX for creating, editing, and deleting new labels - that's essentially what I think we should be shooting for. Some observations about that would be nice to steal:

I dig the random color generator button. I did some googling and poking around Github's source code and you can create a random hex code with this one liner:

const randomColor = `# ${Math.floor(Math.random()*16777215).toString(16)}`;

Or if for whatever reason we wanted to generate RGB this is exactly what Github does:


// Interestingly this function and all of Github's label-related JS can be found if you go 
// to the "sources" panel of DevTools while on Github.com and drill down to 
// github.githubassets.com/assets/app/assets/modules/github/issues/labels.ts
function randomRGBColor(): RGB {
return [
Math.floor(Math.random() * (255 - 0) + 0),
Math.floor(Math.random() * (255 - 0) + 0),
Math.floor(Math.random() * (255 - 0) + 0),
]
}


- I did some Googling on how to go about determining whether the text should be white or black to maximize contrast against a random background color, and found a [functional approach](https://github.com/ZeitOnline/frontend-developer-handbook/blob/main/docs/blog/switching-text-colors-by-lightness.md) (that example is in python but it's super simple) or a [pure CSS approach](https://firsching.ch/github_labels.html) (which is what GitHub uses).  
- I also like having the ability to enter or edit a hex value manually in that input field
- this is definitely more of a nice-to-have, but when you click on the input field you get a [popover](https://www.radix-ui.com/primitives/docs/components/popover) with 16 default colors to pick from. I think we could draw from the [radix-color scales](https://www.radix-ui.com/colors/docs/palette-composition/scales) for some nice looking defaults that work well together. 

I imagine we'll also need a few un-editable, default Labels, including "unknown" and "empty", so we may need some kind of `isEditable` property or something along those lines.

### 2. Create a UI so that Project Managers can create image-level "Tags"
This can live in the same modal as the label creation UI, but Tags would require the following:
- **name**
- **color** - might need a color if we decide to display tags as pills in images table
- **permissions level** - allow Project Mangers to restrict some tags (e.g. "retire") to certain roles

### 3. Allow Project Mangers to edit and delete both Tags and Labels
Editing seems straightforward if we have a single place in the DB where we store the properties and we only reference the tags by immutable Ids, but deleting seems like a bit of a chore to implement on the backend. If it seems too complex we could punt on deleting labels for now. 

### 4. Update how we currently [`getLabels`](https://github.com/tnc-ca-geo/animl-api/blob/6af51379ffc14235c3cbd32e46cbbaf32b3d09e4/src/api/db/models/Image.js#L73) on the backend.
We'll no longer need to aggregate every label on every object on every image (yay).

### 5. Restrict label select dropdown to permitted labels only
Straightforward.

### 6. Figure out how to add all potential classes that the available ML models for a given project might return to the labels array (see [comment](https://github.com/tnc-ca-geo/animl-frontend/issues/124#issuecomment-1483894559) above). 
I don't particularly love any of our options for this. Here are the potential pathways I can think of:

a. When projects are created and new models are added to a project's `availableModels`, add all categories for that model to the project's `labels` array. That would mean also creating a more formal way for super users to add new models to projects (i.e. a form in the UI) because we'd need to be able to trigger the addition of all those categories as a side effect. To be fair, I haven't really thought about the super user UX for adding new models to a bunch of projects... thus far I've just been going into each project record in the DB and manually adding them when new models come online, which isn't sophisticated or scalable either. Perhaps we need to make all models available to all projects, with exceptions for special cases when users have models they want to keep private. Sorry that's kind of a different issue.

b. Add new labels on the fly only when an ML model predicts them. Allow ML models to add any labels they return, and if that that label doesn't yet exist for the project, create one. That might be the better choice but I'm open to suggestions. I'm still not sure how we would set the colors of automatically generated ML labels, though.

nathanielrindlaub commented 11 months ago

After discussion with Nick, we decided to run with option 6.b above for dealing with labels generated by ML models, but also make color a required property on all ML Model categories so we have some default color to start with when we add ML-model-generated labels to projects' "Label Lists".

nathanielrindlaub commented 9 months ago

Documenting some additional decision making around whether or not we should support "deleting", "disabling", and/or "merging" labels

Use cases

Use cases for I could think of for “deleting” labels (including all occurrences of their use on images) include:

PM (Project Manager) created a label by accident that they want to delete
PM changed their mind and decided the label isn't necessary
PM realized after the fact that the label was redundant (another label that wasn't identical per se actually kind of already represented what they were going for)
PM may not have realized that they had some ML class/category/label activated that they actually don't need (e.g. 'vehicle' in a place where there just aren't vehicles, so it just adds a lot of work to clean them up). In this case, un-checking the vehicle class in their automation rules to suppress it going forward, and then deleting the vehicle label, should save a ton of work

Use cases for “disabling” labels (preventing their use going forward, but not removing them from images and still allowing users to filter on them):

PM no longer wants people to be able to label objects with something ("we no longer want users to label 'insects', so let's just remove it altogether so they can't going forward”)
They changed ML models and the new model has updated classes that they’d like to use going forward; so the PM disables the old classes to make sure reviewers don’t keep using them

Use cases for “merging”:

For example, say you have “scrub jay” “burrowing owl” and “raven” labels – could you convert them all to “bird”? "Unless I’m not thinking it through completely, I feel like that might obviate the need to full on delete labels, especially if you had a label like “other” that you could just toss unwanted labels into" - Lara)

Implementation concerns

For "deleting" and "merging" labels, we would need to iterate over all the images in that project and remove any image.object.label that matches. If there are a lot of images that have that label, that would require either a batch task or an SQS queue + separate lambda worker to handle the async job.
For deleting labels, because objects can have multiple labels, deleting one label out of an image’s label array might produce a bunch of unexpected behaviors
- for example if an object had a "animal" label and a "rat" label that had been validated by a user, deleting the rat label wouldn't delete the object entirely, it would just display its "animal" label on the frontend
- we'd also have to be mindful of updating an object's locked state if we were to delete a label that happened to be the first validated label in the object's label array. In that case it should go from locked: true to locked: false. This feels like it’s getting into messy territory
- Merging wouldn’t have this issue because we’re swapping one label ID out for another and we presumably would keep all the other locked/validation data the same.

Decisions

We're moving forward with implementing "disable" label functionality
We're supporting a compromise/MVP version of the "delete" label functionality in which we implement the option to delete a label on the front-end, set some total image update threshold (I think we settled on 500 image records), and when a user requests a delete label operation, we count up how many images this would impact and if it's below the threshold and can be run syncronously in the animl-api GraphQL lambda, we run it, if not we return an error that explains the label has been used too much and to contact me to remove it entirely from your project.
- in order to avoid unexpected behavior that might result from a weird object locked state described above, we are also (a) unlocking all objects that are locked and for which the first validated label in the label array is the one we're removing, and (b) deleting the object if it only has a single label and its the one we're removing.
We are going to hold off implementing label "merging" this for now, because (a) it can be accomplished downstream in Excel, (b) it likely would involve manipulating a larger # of images, and thus necessitate async workflow, and c) kind of hard UX to implement on the front end

nathanielrindlaub commented 8 months ago

Done!

tnc-ca-geo / animl-frontend