tnc-ca-geo / animl-api

Backend for https://animl.camera
Other
4 stars 0 forks source link

Handle ML models creating new labels after users have validated or edited previous ones #270

Open nathanielrindlaub opened 1 month ago

nathanielrindlaub commented 1 month ago

Context

When Projects are using an object detector and a classifier to process images, there's a lag between the image receiving a prediction from the object detector and it receiving a prediction from the classifier (especially with bulk upload, because there may be a long queue of images waiting for inference). In that period of time it's possible for a user to interact with the first prediction – for example by validating an "animal" prediction as correct or adding their own manual label to the new object prediction – followed by an ML model adding a new label to that object, which will lead to a situation in which the object appears locked (because it has a validated label), but the image will be "not-reviewed" because we set images' reviewed-state to false whenever there's a new ML prediction.

This is a bit of an edge-case, but it did occur to a user recently who did not know they had a new classifier step added to their Automation Rules so wasn't aware that a second prediction was on its way.

Possible solutions

After some discussion with Falk and Henry, we ID'ed the following potential fixes:

  1. Prevent users from interacting with the app while a bulk upload is still processing, for example by preventing users from closing out the upload modal while an upload is ongoing.

    • cons: bad user-experience; wouldn't eliminate the potential for this to occur with wireless camera data.
  2. When new ML-generated labels are added to a previously validated and locked object, unlock that object.

    • pros: simplest option to implement
    • cons: this would potentially introduce confusion (why is my object back to being unlocked and now have potential 3 labels?) and force users to re-review images that they may have already applied correct labels to.
  3. When new ML-generated labels are created, check to see if the object already has a human validated label, and if so, add the ML label but do not change it's reviewed state to false, and keep the object locked.

    • pros: simple to implement, would allow users to start interacting with objects ASAP, doesn't create extra work for users by forcing them to re-review objects.
    • cons: not super transparent - i.e., users wouldn't have any way of knowing that there had been a new prediction/label added to that object. They could also potentially validate the initial "animal" label, not knowing that they're about to receive a more specific prediction from the classifier.
  4. Implement solution 3 above, but also add some indication in the frontend that the object/image has new, unseen predictions (perhaps add a column in then image table and/or some visual indicator in the loupe panel)

    • pros: all of the benefits of 3, plus greater transparency
    • cons: not sure what the UX looks like after we indicate to the user that there are new predictions. Do we tell them to "unlock the object to see new predictions"? Do we automatically unlock it for them like in solution 2? Do we create some new way to show the full label stack of a locked object? All of that sound like big UX and design challenges and very difficult to make intuitive. Our label/object data structure and design is quite confusing as it is.
  5. Add a awaitingPrediction field to images that gets set to true each time it's submitted to an inference queue, and reset to false every time it's been successfully processed. When awaitingPrediction: true, prevent users from interacting with objects on that image on the frontend, and in case the image's state in the frontend is out of sync with the DB (very likely in this scenario), also check awaitingPrediction on the API side and return an error if users attempt to add/validate/invalidate labels, move bounding boxes, etc.

    • pros: would solve the issue for both wireless and bulk uploaded data; would be very transparent: users would understand what is happening and also ultimately see the full stack of ML predictions that were made.
    • cons: added DB writes every time an image gets added & removed from an inference queue.

I'm leaning towards solution 5, because it seems like the best option for maximizing transparency and simplicity and minimizing UX friction. I still want to test what happens if users invalidate an "animal" prediction before getting a classifier label (forgot to test that) and think through what the implications would be if we ever implemented retroactive inference.

nathanielrindlaub commented 1 month ago

If users invalidate an "animal" prediction and then a fresh ML label gets added on top of it, just like what happens when you validate the "animal" label, the object will stay locked and because the first human-reviewed label was invalidated, no object will appear in the frontend. I think you'd really only invalidate an animal prediction if it was a false-positive in the first place, so not sure how common of an issue this would be to a user, but if it happened the resulting behavior is definitely not transparent.

nathanielrindlaub commented 1 month ago

And in the context of retroactive inference, implementing solution 5 would still be a relevant and helpful feature. It would not, however, help us address how to handle the situation in which a fresh prediction is made on a previously validated and locked object, so I guess we'd need to re-cross this bridge if we ever implement retroactive inference.

Perhaps we expose this decision to users when they initiate an inference run and give them the options to either (a) unlock existing objects if new predictions are made on them, or (b) keep their objects locked, and simply add the new predictions to the stack. Retroactive inference will require addressing a lot of questions and require a lot of user input, especially for images that already have some predictions or that have been reviewed. For example, does the user want to run the full automation rules pipeline on the images, or just request predictions from a specific model? How will the user specify conditions on which to trigger an inference request?

It's going to be a headache no matter what (hence why we haven't implemented it), so I think implementing 5 is a no-regrets decision and we can kick these complex questions down the road if we ever want to offer retroactive inference.