Image-level "tags" - Githubissues

Background

Right now, Animl's data model and UI are designed to support object-level (sub-image) annotations. That is a fairly unique design decision in the camera trap data management space as far as I know, and we did it primarily because (a) we get bounding boxes back from our object detectors, and the labels that describe what's in the bounding box exist at the object-level, so why not preserve that level of granularity, and (b) should users ever want to export their images and annotations for training a classifier, having object-level annotations will allow them to crop out the backgrounds of their training data and thus improve model accuracy and generalizability.

The object-level annotation schema looks like this:

{
  ...
  _id: { type: String, required: true },
  objects: [
    {
      bbox: { type: [Number], required: true },
      locked: { type: Boolean, default: false, required: true },
      labels: [
        {
          type: { type: String, enum: ['manual', 'ml'], requried: true },
          category: { type: String, default: 'none', required: true },
          conf: { type: Number },
          bbox: { type: [Number] },
          labeledDate: { type: Date, default: Date.now, required: true },
          validation: { type: ValidationSchema },
          mlModel: { type: 'String', ref: 'Model' }, // if type === 'ml'
          mlModelVersion: { type: 'String' },
          userId: { type: String } // if type === 'manual'
        }
      ]
    }
  ]
  ...
}

However, the downside of object-level annotations, is that they're tedious to validate/invalidate/edit, tedious to add if missed by the object detection step, and some annotations naturally belong at the image-level (e.g., "empty", "seen", "presence/absence", "retired", "favorite", "day", "night" etc.).

I want to address these issues with two improvements:

Provide hotkeys and buttons for acting on ALL object-level annotations in an image at once (see #41)
Allow users to create and apply "image"-level annotations. The following describes how I envision accomplishing this.

Implementation [UPDATE 10/30/23: THIS IMPLEMENTATION APPROACH IS OUTDATED; SEE UPDATED IMPLEMENTATION DETAILS IN COMMENT BELOW]

Borrowing from Timelapse's template editor, I think we should provide Project Managers the option to create image-level labels through the label creation UI (#124). We might provide a couple un-editable, default image-level label options already - like "Empty" - but we'd want to allow users to create their own to meet their specific label review needs. We might also want to allow users to customize what roles can apply these labels, as Project Managers might want to restrict certain image-level annotations like "Retired" to users with certain permissions levels.

Under the hood, these would still be Objects and added to the Image.Objects array just like our current object-level annotations (although, might want to re-think that naming and instead call them all Annotations). I think this makes sense because they require a lot of the same user interactions as object-level annotations, it would allow users to use the existing label filtering interface and logic to query/filter on them, and they'd show up in the labels column of the Images Table just like object-level labels. "Empty" labels in particular are typically ML predictions, so they need to have a "unlocked"/"pre-validated" state and be capable of being validated/invalidated. It may be the case that other other ML models we integrated down the road provide image-level labels too.

The big difference, from the users' perspective, would be that instead of rendering image-level labels as essentially full-frame Objects with bounding boxes that have the same dimensions as the image itself (as we currently do with "Empties"), we'd display ALL image-level label options as checkbox-like/toggle-able buttons below the image in the Loupe. Toggling/checking a button would add an Object to the image, unchecking it would remove it. The only thing we'd need to figure out from a design perspective is how to visualize that "unlocked"/"pre-validated" state, but I think I can figure out something visually intuitive.

New approach - Image "Tags"

The new approach and concepts are described at a high level in this comment in a narrative format. I'll follow up with implementation steps and more granular requirements in more comments below.

Store image-level annotations in their own array, and limit them to booleans. Let's call them image "Tags" for clarity.

After further consideration, I've decided to change course a bit as I think there's a simpler implementation solution that meets most of our goals. The TL;DR is that rather than using the same schema for image-level annotations as we do for object-level annotations (as I had advocated for above), use a separate array on image records (image.tags) to store the image-level annotations, AND limit data-type of image-level annotations to booleans that can be represented in the UI by checkboxes. So unlike Timelapse, which can support all sorts of custom field types (integers, text boxes, selects, etc.), we're going to just stick to booleans for now. More on that below.

Don't change treatment of "empties"

Also, even though "empty" seems like it should be an image-level piece of data, I actually don't want to refactor how we represent empties in the DB or UI (i.e., empties will remain "objects" in the schema and UI, albeit ones who's bounding boxes are the full size of the images). Empty is a bit of a special case because it is the only* image-level annotation I can think of that can be predicted by an ML model, and thus also needs states for locked/unlocked and validated/invalidated. So while it admittedly makes more sense from an structural/logical/schema perspective to treat "empties" at the image-level, I didn't want to (a) deal with the costly and confusing refactoring headache for something that from a user's perspective is probably just fine as-is, (b) deal with designing and implementing representations for all that additional state data that we'd only need for this one unique case. So for those reasons, let's leave "empties" as they are, and instead of thinking of the distinction between these types of annotations as being "image-level" vs "object-level", it might be more helpful to think of it as a distinction between "ML annotations" and "manual annotations" (though even that distinction isn't a perfect one, as humans can also manually create object-level annotations...). Basically, ML models and/or humans will be able to draw bounding boxes on images and create granular annotations that way, and human users can also create customizable sets of tags/checkboxes to further enrich their data and create review-workflows to meet their needs.

Examples of the kinds of image-level tags users might want

I gravitated towards this new approach after brainstorming as many possible image-level annotations that users might find useful (I also asked around for additional suggestions), and what became apparent was that a lot of them of them could be supported with simple booleans and checkboxes in the UI. Those examples include:

“Retired”, “reviewed”, “seen”, "double-checked" etc.
“Interesting”, “favorite”, etc.
“tagged”, “radio-collared”
“Night” / “day” (technically enum, but could be done with two boolean checkboxes)
"rat" / "no-rat" (presence or absence of a specific animal)

Many of the others that emerged in our brainstorm can be supported by adding an additional comment field to all images, so I think we should enable an comment field for all images by default:

Description of behavior, pose, position
Number of animals present

There's one other category of image-level annotation that came up that might be tricky to support, and that is a species annotation at the image-level. That is, rather than assigning a species label to an object within the image, if users don't want to deal with creating species annotations at the object-level, how do we support them just labeling a whole image as having a deer/cat/mountain lion/whathaveyou. I have an idea for solving for this - basically add a drop-down select to the toolbar containing all available object-level labels, and when a user selects one, add a full-size object to the image with that label in exactly the same way we do when users click the "mark as empty" button. However, this is lower priority as I'm not convinced of the demand for it. So let's put a pin in that for now.

Technically you could train a whole-image classifier, but I'm not sure why you would.

Some of what I described above I'll break out into separate issues, but here are my quick thoughts what implementing image-level tags will entail:

[ ] add the ability for Project Managers to add new tags via the label creation UI
[ ] add an array to image schema (called image.tags) in which we'll store the tagIds that users have applied to a given image
[ ] on the frontend, we'll need to fetch available tags from the somewhere in the DB (TBD by the implementation of #124), and below each image, render all available tags as checkboxes
[ ] map the image.tags to the available tags (if the tagId is present in the array, render that checkbox as checked)
[ ] create mutation resolvers for adding/removing tags from images
[ ] allow users to filter by both the presence and absence of all tags. I'll create a separate issue of this.

tnc-ca-geo / animl-frontend