tnc-ca-geo / animl-api

Backend for https://animl.camera
4 stars 0 forks source link

Figure out how to handle users overriding the `Project.label.name` of ML-generated labels #181

Open nathanielrindlaub opened 2 months ago

nathanielrindlaub commented 2 months ago

Currently an ML model can predict a label and if the label is not present in a user's Project.labels it will create one, but users can subsequently change the Project.label.name of those labels, which I think will cause problems when the ML model generates new predictions with the old name but same label._id.

nathanielrindlaub commented 2 months ago

For additional detail, this is how to reproduce the issue and why it occurs:

  1. user uploads images, ML models generate predictions, createInternalLabels() checks to see if a project label with that name exists, sees that it doesn't and creates new Project.labels using the ML model config's category _id and name (e.g. { _id: 'bird', name: 'bird' }
  2. user then changes the name to something else, so now the Project.label is { _id: 'bird', name: 'shearwater' }
  3. user uploads more images, ML makes more predictions, but now it doesn't find a project label with a matching name, so it attempts to make a new project label... but it fails because the $set condition only adds the new project label if its _id isn't present in any other project labels, which it is!
nathanielrindlaub commented 2 months ago

Solution idea 1:

During createInteralLabels(),

I think this would work, and it would allow users to create "category mappings" which ultimately I think would be really useful (for example, if we have a model that predicts the scientific name of a species and the user wants to map it to the common name).

However, it's a little odd then to have a mismatch between what the models are predicting natively and what they're getting mapped to. For example, when adjusting category configs in the automation rules, those categories would still be displayed as the original ML model categories so you might loose track of what they've been mapped to. Might get a little messy/disorganized. I think "category mapping" will be useful and worth implementing, but maybe should probably happen at the automation rule configuration level?

Solution idea 2:

Prevent users from changing the names of ML generated labels entirely. This would be a little tricky b/c labels can start as non-ml generated (if for example I had set up a "bird" category before getting MIRA predictions), but we could hook into the moment when the model looks up that existing project.label and if it finds one and uses it set some property in the project.label to ml = true or something. Once that happens it would never be possible to change it back to ml=false, even if the user stopped using the model that predicted it, however.