Open BradyJ27 opened 1 year ago
To clarify:
Are there multiple images created by the pipeline or do you have an original image that you want to compare to the output? Can you give an example of what is produced by the model?
Usually it is multiple images. For example, we would print out the labels for the detections that we have made on the validation set.
The following example is from YoloV8s default output. The dvclive yolo demo notebook is a good place to reproduce this.
Yolo actually does both the validation labels (the ground truths) and the predicted values. This could be useful for comparing and contrasting.
Let me clarify the question. In the example above are there multiple images available for 000000000042.jpg
? Do you have an image available with each of the combinations of labels available? I.e 1 for each of
I am not an expert in image manipulation but AFAIK removing these labelling boxes from an image is not a trivial task.
Have you seen this done elsewhere?
So the image is just a copy of the training or validation image with bounding boxes added using some library (usually matplotlib). The bounding boxes are stored in a formatted file (xml, csv, json, or some custom format) normally like "label,x1,y1,x2,y2" (one for each label, i.e. "person,..." \n "dog,...")
So the approach would not be to manipulate the image with the boxes already on it, but rather set the image as the original image from the validation set, then place the interactive bounding boxes over the original image.
In other words, we have 2 files:
And the above images are generated by combining the two files reading the labels and placing them on top of a copy of the original image, thus creating the third file which is the image with bounding boxes displayed. My suggestion is that we take this step and turn it into some interactive format within dvc.
See https://docs.wandb.ai/guides/track/log/media#image-overlays for ideas on how others do this
One option for implementation would be a custom plot template, right?
Or is this something that's a little more in depth and actually a bigger feature?
One option for implementation would be a custom plot template, right?
No, I do not believe that you could shoe-horn the required data/image into the current DVC plots engine.
Or is this something that's a little more in depth and actually a bigger feature?
My opinion is that this is a larger feature given the current state of plots.
My opinion is that this is a larger feature given the current state of plots.
Ok, that makes sense. I'm sure some more discussion needs to be had regarding implementing something like this, but I would be happy to help contribute!
@BradyJ27 can you provide a concrete example of one of the XML files that you mentioned here? Is this the only format available?
Looks like we might be able to get away without using a plotting library for this. One potential way would be to use https://github.com/lovell/sharp in the clients + generate SVG bounding boxes based on the definitions (XML or other files). Loading the original image with the previous package gives us the option to call image.overlayWith(svgElementBuffer, {top:0, left:0}).toBuffer()
where the svgElementBuffer
is an SVG full of <rect>
elements (source).
@BradyJ27 can you provide a concrete example of one of the XML files that you mentioned here? Is this the only format available?
I can share an example of the default yolo labels. This is just a text file, but the idea is the same in txt, csv, xml, json, etc. It can technically be any type of file, depending on what architecture you are using, but the above are the most common.
How do you determine which class the provided data relates to?
This is the contents of the file (for anyone else reading the issue):
45 0.479492 0.688771 0.955609 0.5955
45 0.736516 0.247188 0.498875 0.476417
50 0.637063 0.732938 0.494125 0.510583
45 0.339438 0.418896 0.678875 0.7815
49 0.646836 0.132552 0.118047 0.0969375
49 0.773148 0.129802 0.0907344 0.0972292
49 0.668297 0.226906 0.131281 0.146896
49 0.642859 0.0792187 0.148063 0.148062
@mattseddon the first number corresponds to a dictionary containing the classes.
It's something like:
...
44: "dog",
45: "person",
46: "car",
...
This is found in a dataset configuration file (specifically for yolo), which is data.yaml
.
There is often some configuration similar to this whether it be a dictionary in a training script, a data configuration file, or sometimes the labels are hard coded in the labels file.
I will say that this above is yolo specific, it is more often just the actual label instead of a number corresponding to a dictionary.
I was just coming here to revisit (was busy for the past month) this and create some issues in the data and render repos, but it looks like you guys have maybe taken another look. Should I go ahead and create some additional issues and start looking into this, or is this in progress already?
I was just coming here to revisit (was busy for the past month) this and create some issues in the data and render repos, but it looks like you guys have maybe taken another look. Should I go ahead and create some additional issues and start looking into this, or is this in progress already?
@BradyJ27, feel free to do that, thanks. I've started to look into how Studio and VSCode are going to render these images but I'm currently not looking into dvc
/dvc-render
side of things.
While researching on UX, I took into account that while both Studio and VSCode use React for the frontend, Studio has a backend based in Python and VSCode has a backend based in NodeJS. So far, I've come up with two ideas on how the clients (VSCode/Studio) would handle this.
- Rely on the client backend to create images with the needed bounding boxes. The frontend would render these images.
Both NodeJS and Python have multiple image manipulation libraries that we could use for creating images with bounding boxes. Matt has already mentioned sharp
for NodeJS.
Studio and VSCode have different backends, so we would have to go about creating images in different ways. This would make keeping things consistent across products more difficult.
- Send the box coordinates to the frontend and have the frontend render the bounding boxes onto an image using SVGs or HTML canvas (I believe W&B )
Since both Studio and VSCode use React in the frontend, it will easier to have consistent plots in both clients. React also has some libraries for Canvas (KonvaJS, FabricJS) and SVGs that would simplify the solution instead of using just Vanilla APIs.
The solution for rendering the bounding boxes will probably be a bit more complicated then using the methods that backend libraries offer.
What do we think?
It would be nice to have some sort of interactive plots where the user can toggle on/off different objects based on labels.
We will probably want some level of interactivity like this at some point, so I think it makes sense to go with option 2.
Started working on implementing this and, after trying HTML Canvas and SVGs, decided on using SVGs to render the plots since they are easier to create and will be more performative especially when it comes to resizing the plots.
Next, I started working on the UI design for the togglable boxes. Here is what I have so far (created in storybook):
Looking at Studio, either version could fit there as well:
What do we think? cc @shcheklein @iterative/vs-code
Look cool, @julieg18 !
Do we want to toggle classes in all revision plots for a specific image path at once or have the toggles per single plot? I tried designs for both for now. There's also the option of toggling classes across all images in the webview at once.
My 2cs. I think we should do toggle all images per path at once, for now.
What colors are we going to be using for the bounding boxes? I just chose red and blue for now but I'm assuming we want a pre-set of more muted colors?
let's take a look how YOLO generates colors / boxes and take if from it?
Is the HTML produced by the CLI (i.e. plots diff
) out of scope for this?
Is the HTML produced by the CLI (i.e.
plots diff
) out of scope for this?
I don't think CLI support is a requirement unless it's helpful to consolidate the VS Code and Studio implementation (similar to images per step).
Is the HTML produced by the CLI (i.e. plots diff) out of scope for this? I don't think CLI support is a requirement unless it's helpful to consolidate the VS Code and Studio implementation
Are we referring to the DVC CLI being able to create these plots with bounding boxes?
If so, if it is doable for the CLI to create the bounding box plot SVGs, that could help with consolidation since Studio and VS Code would only need to create logic for toggling boxes. Currently, both Studio and VSCode need to create the SVG elements from the image src and bb coordinates as well as the toggle logic.
In computer vision, specifically object detection, it is common for a pipeline to output images with bounding boxes displaying the area of interest for specific objects. When the image(s) are relatively small or packed with multiple objects, it can be hard to view these images.
It would be nice to have some sort of interactive plots where the user can toggle on/off different objects based on labels.
This may require some dvc or dvc-render changes first, but just opening here because it would be beneficial to have this implemented within the VS-Code extension.
Related issues:
https://github.com/iterative/dvc/issues/10198