plots: interactive plots with toggling bounding box

BradyJ27 commented 8 months ago

Issue stems from: https://github.com/iterative/vscode-dvc/issues/4917

The idea is to have an interactive plot (likely not a template of the current plot system), for which the user can view different labels for each class, and toggle labels on and off to see bounding boxes for specific classes.

dberenbaum commented 8 months ago

Some thoughts around how we could do this (open to other ideas):

Here is nice lib that shows the different BB formats and converts between them: https://github.com/devrimcavusoglu/pybboxes.

Basically, the formats are:

x1,y1,x2,y2 (top left and bottom rights corners)
x,y,w,h (point+width/height)
- x,y may be top left coordinates
- x,y may be center coordinates

These can be encoded in a JSON or other structured file, but unfortunately there doesn't seem to be any standardization on the file format, so we would either need users to specify the file structure, or we can start by having dvclive write out a standard format that we can parse, which could be something like:

[
  {"path": "image1.png", "label": "cat", "bbox": [100, 110, 5, 20]},
  {"path": "image2.png": ...},

If we do that, we should have enough info to render bounding boxes in vs code and studio. To find this annotations file, we could:

write the path in dvc.yaml under an annotations key
ask to configure it in vs code/studio
save it to some standard location (.dvc/annotations?)

BradyJ27 commented 8 months ago

To find this annotations file, we could:

write the path in dvc.yaml under an annotations key

In my mind this option makes the most sense. This allows the user full control over where and how the annotation files are being written, and it allows easy integration with dvclive if some frameworks automatically output this info at a set location.

shcheklein commented 8 months ago

Should it be an annotation file per image though, folks? Let's say I have a pipeline that is producing some new images every time and doesn't delete old ones. In this case how I am supposed to update that single file?

Also, a single file can become super painful to parse - it can be slow, we can run out of memory, etc.

BradyJ27 commented 8 months ago

Yes I think one annotation file per image is more standard (considering there are very few standardizations here)

In my mind the path would actually be a directory path which contains x structured files and directly correlates with the number of images you would like to display.

For example I run my newly trained model on 5 images and produce bounding boxes for all 5, there would be 5 images and 5 annotation files.

dberenbaum commented 8 months ago

Right, one annotation file per image is better.

I wonder then if it's worth introducing annotations into dvc.yaml right away or starting with a convention like we did for images per step. For example, for any image image1.png, if there is a corresponding image1.json, then we try to draw the bounding box in VS Code and Studio. It doesn't give as much control as including the annotations in dvc.yaml, but it's less configuration, and the lack of standardization means that I doubt any framework will auto-generate exactly the expected format.

BradyJ27 commented 8 months ago

I wonder then if it's worth introducing annotations into dvc.yaml right away or starting with a convention like we did for images per step. For example, for any image image1.png, if there is a corresponding image1.json, then we try to draw the bounding box in VS Code and Studio. It doesn't give as much control as including the annotations in dvc.yaml, but it's less configuration, and the lack of standardization means that I doubt any framework will auto-generate exactly the expected format.

For this do you mean have a set location that DVC looks for files and then display in studio/vscode if those files exist? Like store them all in something like dvclive/annotations/ and it is expected that the user gets those files formatted correctly in the correct location. Or would this start out as an extension of dvclive and basically just allow only dvclive frameworks?

dberenbaum commented 8 months ago

For this do you mean have a set location that DVC looks for files and then display in studio/vscode if those files exist?

I was thinking the simplest is to have them right next to the images themselves so that we don't have to worry about mapping the annotations to the images. For example:

images
.
├── image1.png
├── image1.json
├── image2.png
├── image2.json

Like store them all in something like dvclive/annotations/ and it is expected that the user gets those files formatted correctly in the correct location. Or would this start out as an extension of dvclive and basically just allow only dvclive frameworks?

Both, but yes I am mostly focused on dvclive here. I'm taking inspiration from https://dvc.org/doc/dvclive/live/log_image#images-per-step, where we have a similar convention.

I am not against ultimately codifying this in dvc.yaml since that is indeed the "dvc way," but I don't see the benefit of starting there for a few reasons:

Let's start simple. We can always add dvc.yaml support later but it's harder to take it away. It's also more work to define a good spec before we have anything working yet.
Users will need to format the annotations correctly either way. The lack of standardization unfortunately makes it unlikely that users are going to have existing annotations that fit the expected format.
If users will need to adjust to our annotation format anyway, I'm not sure it will be any harder or more restrictive to follow this convention than to specify it in dvc.yaml. As long as we document the pattern, it's still usable outside of dvclive.

julieg18 commented 7 months ago

Hello, wanted to check on the status of this issue? Have we decided that we are going to have the users set the json files to be next to the images themselves? If so, what BB format will we be using for the json files?

BradyJ27 commented 7 months ago

Have we decided that we are going to have the users set the json files to be next to the images themselves?

This makes the most sense to me.

If so, what BB format will we be using for the json files?

I think x1,y1,x2,y2 (top left, bottom right points) - is more common. Obviously I do not get the final say, but just my $.02 as a user.

dberenbaum commented 7 months ago

Okay, added https://github.com/iterative/dvclive/issues/766 to propose how we want to do this on the backend.

julieg18 commented 7 months ago

Okay, added https://github.com/iterative/dvclive/issues/766 to propose how we want to do this on the backend.

Thanks! Just wanted to make sure I'm understanding things correctly, are we planning to have these JSON file contents or paths be shown in any way inside of dvc plots diff or is VSCode/Studio going to need to check/parse for JSON files whenever the dvc plots diff has image files?

Update: Discussed synchronously in VSCode planning, and concluded that there would need to be changes made to dvc so that the different versions of json files are saved somewhere and VSCode/Studio have a way to access to their contents.

dberenbaum commented 7 months ago

@julieg18 @mattseddon Is there a plan for how to implement it? Do you need more clarification on anything?

julieg18 commented 7 months ago

@julieg18 @mattseddon Is there a plan for how to implement it? Do you need more clarification on anything?

No, we haven't decided on a plan on how to implement this on DVC's end. We were planning to discuss what would need to be done in Slack.

dberenbaum commented 7 months ago

@julieg18 @mattseddon @AlexandreKempf Can you two agree on the API of what dvc should pass to get the bounding box info in vs code and studio?

AlexandreKempf commented 7 months ago

I think @julieg18 has already made excellent progress on her side. @julieg18 Could you send me what you expect, and I'll do the rest on dvclive side ;)

AlexandreKempf commented 7 months ago

I'm planning on having a log_bounding_box function on dvclive side. It means that it is independent of the log_image. I'm saying that because I'm wondering how we should handle the relative coordinates (coordinates that are floats between 0 and 1). On my side, if I want to handle them and provide you with a clean and simple format, I'll need to reload the image (or do some hacks). Otherwise, the code is pretty clean. How hard / bad practice is it to handle both int and float on the front end side? Note: int means pixels, float means relative value.

julieg18 commented 7 months ago

@julieg18 @mattseddon @AlexandreKempf Can you two agree on the API of what dvc should pass to get the bounding box info in vs code and studio?

Could you send me what you expect, and I'll do the rest on dvclive side ;)

Currently, my PRs are expecting a format like this:

{
  "boxes": [
    {
      "label": "cat",
      "box": {
        "left": 100, "right": 110, "top": 5, "bottom": 20
      }
    },
    {
      "label": "cat",
      "box": {
        "left": 30, "right": 55, "top": 75, "bottom": 90
      }
    },
    {
      "label": "dog",
      "box": {
        "left": 80, "right": 100, "top": 25, "bottom": 50
      }
    }
  ]
}

But would it possible for the boxes to be sorted by the label? That would reduce processing on the Studio/VS Code. Example:

Outdated example

```json { "boxes": [ { "label": "cat", "boxes": [ { "left": 100, "right": 110, "top": 5, "bottom": 20 }, { "left": 30, "right": 55, "top": 75, "bottom": 90 } ] }, { "label": "dog", "boxes": [{ "left": 80, "right": 100, "top": 25, "bottom": 50 }] } ] } ```

Update, I meant:

{
  "boxes": {
    "cat": [
      { "left": 100, "right": 110, "top": 5, "bottom": 20 },
      { "left": 30, "right": 55, "top": 75, "bottom": 90 }
    ],
    "dog": [
      { "left": 80, "right": 100, "top": 25, "bottom": 50 }
    ]
  }
}

AlexandreKempf commented 7 months ago

@julieg18 Technically, it is possible, yes. I personally prefer the first format, but if that helps you a lot, we might take the second one. For most object detection tasks, running this kind of processing (grouping by label) during the logging phase is probably smarter because it is not very time-sensitive. If the plots are lagging, it can be frustrating for the user, so let me know if it really increases performance on your side (enough so that we take the tradeoff with the json format). Any opinion on the topic @dberenbaum?

julieg18 commented 7 months ago

I'm saying that because I'm wondering how we should handle the relative coordinates (coordinates that are floats between 0 and 1). On my side, if I want to handle them and provide you with a clean and simple format, I'll need to reload the image (or do some hacks). Otherwise, the code is pretty clean. How hard / bad practice is it to handle both int and float on the front end side? Note: int means pixels, float means relative value.

If I'm understanding your question correctly, the frontend can handle both whole numbers and numbers with decimals when it comes to the box coordinates.

julieg18 commented 7 months ago

@julieg18 Technically, it is possible, yes. I personally prefer the first format, but if that helps you a lot, we might take the second one. For most object detection tasks, running this kind of processing (grouping by label) during the logging phase is probably smarter because it is not very time-sensitive. If the plots are lagging, it can be frustrating for the user, so let me know if it really increases performance on your side (enough so that we take the tradeoff with the json format). Any opinion on the topic @dberenbaum?

While it wouldn't increase performance tremendously, every bit helps especially when it comes to running huge numbers of images/boxes. It also reduces code repetition in VSCode and Studio, since both products have to sort the boxes by label. We can definitely work with the first format though if it works better on DVC's end.

AlexandreKempf commented 7 months ago

If I'm understanding your question correctly, the frontend can handle both whole numbers and numbers with decimals when it comes to the box coordinates.

Yeah, that was my question. But to make extra sure: Which boxes are handled by the front end?

{"top": 10, "left": 10, "bottom" : 20, "right": 20}
{"top": 10.2, "left": 10.5, "bottom" : 20.3, "right": 20.7}
{"top": 0.1, "left": 0.2, "bottom" : 0.3, "right": 0.4}

AlexandreKempf commented 7 months ago

I needed to be more explicit in my previous comment, I apologize.

In data science, we have two units for the bounding box's coordinates: pixels and %. In the first case (pixels), the values we give for the left/top/right/bottom are directly in pixels. These are usually integers, but not always. So the bounding box {"top": 10, "left": 10, "bottom": 20, "right": 20} means that the top of the object is 10 pixels from the top of the image. In the second case (%), the value we give for left/top/right/bottom is relative to the image size. For instance, for {"top": 0.1, "left": 0.2, "bottom": 0.3, "right": 0.4}, if the image is 100x100 pixels, the bounding box will have these pixels coordinates: {"top": 10, "left": 20, "bottom": 30, "right": 40}. This unit is useful in data science because we can scale the image without touching the bounding box coordinates.

I believe we should support both. It is quite easy to determine which unit the user is using (all values are in the <0,1> range for %) but then the user needs to give us details about the image size.

We agree with @julieg18 that DVClive should be the one in charge of handling the conversion from % to pixels.

julieg18 commented 7 months ago

But would it possible for the boxes to be sorted by the label? That would reduce processing on the Studio/VS Code. Example:

Whoops! Apologies, I made the example incorrectly 🤦‍♀️. I meant something like this:

{
  "boxes": {
    "cat": [
      { "left": 100, "right": 110, "top": 5, "bottom": 20 },
      { "left": 30, "right": 55, "top": 75, "bottom": 90 }
    ],
    "dog": [
      { "left": 80, "right": 100, "top": 25, "bottom": 50 }
    ]
  }
}

AlexandreKempf commented 7 months ago

@dberenbaum Can we agree to follow this schema or do you see any problem with it? :)

dberenbaum commented 7 months ago

I think it's fine as a starting point, although can we agree it may change as we try it?

We still may need to reconsider the tradeoffs between rendering performance and useful schema. For example, if someone wants to resize an image without recomputing bounding boxes, it would be nice to allow for relative coordinates in the schema, but not sure if this really outweighs the performance benefits of precomputing fixed coordinates.

AlexandreKempf commented 7 months ago

We took some time to discuss with @julieg18 the difference between the two schemas (https://github.com/iterative/dvc/issues/10198#issuecomment-1932066785).

There were some things that were discussed:

The logger (DVClive) will probably not store the entire dataset but only a few examples. I don't think people will bother writing their own JSON files that go along the images themself (I might be wrong here), so DVClive will be the only writer of that file. Having an exotic schema is not a big deal. This argument goes in favor of the second schema.
Going from schema 1 to 2, we aggregate data based on the label. Do we want to aggregate data based on a field (in that case, the box's label), or do we want to leave this decision to the display? If we are sure that the only aggregation we will ever use is the label, then schema 2 is not a problem. However, if there is a doubt, then schema 1 is better.
Schema 2 would produce faster plots (and if there are >100 bboxes on an image, it can be useful), but we don't have numbers yet on the topic. It would also simplify the code on the front end side and avoid code duplication between VScode and Studio.

AlexandreKempf commented 7 months ago

@dberenbaum Concerning the relative coordinates, if the bounding boxes are stored alongside the image, we know the size of the image. So we could allow the user to log the bounding boxes with <0,1> values in the python code but save it in the JSON with pixel values. If the users are really saving the JSON themself, then it is another problem :) What do you think?

dberenbaum commented 7 months ago

I think it's fine. My point was more that we should expect that things might change after we play with it more, so let's not spend too much time focusing on whether it's the right schema yet.

julieg18 commented 7 months ago

We've decided on starting with:

{
  "boxes": {
    "cat": [
      { "box": {"left": 100, "right": 110, "top": 5, "bottom": 20}, "score": 0.8 },
      { "box": {"left": 80, "right": 130, "top": 13, "bottom": 55}, "score": 0.5 }
    ],
    "dog": [
      { "box": {"left": 81, "right": 160, "top": 16, "bottom": 52}, "score": 0.1 }
    ]
  }
}

I'll update Studio and VSCode's PRs to use this format.

BradyJ27 commented 6 months ago

What is the status of this issue? Does #10312 need to be taken over? I'd be willing to look into it if need be!

shcheklein commented 6 months ago

hey @BradyJ27 , absolute, feel free to take a look and help us get it done! thanks.

iterative / dvc

plots: interactive plots with toggling bounding box #10198