Closed AlexandreKempf closed 6 months ago
Thanks @AlexandreKempf! Can you show an example of how to use it?
Why use a separate method rather than make it part of log_image
? How do we ensure it's connected to the image?
@dberenbaum and @shcheklein, I'll give you an example in the Yolo example, as you asked, and post some of it here for discussion when it is done.
@dberenbaum Concerning the log_image extension, I asked myself the same question initially.
log_bounding_boxes
load_image
with arguments. First, the bounding boxes, then the polygons/mask representation, then the segmentation... each takes a large json
like structure and additional information on how to parse it (the format="tlbr"
for instance).log_image
for that, we'll have to save the same image several times. Using a second method, we are free of this problem. It is not a strong argument, as we could save the image every time also and it will still work.log_image
log_image
might discover the feature by looking at the log_image
signature. I personally know that I don't watch the docs of the tools I was using frequently. So, this was my favorite way to discover new features ^^.User will usually use log_bounding_boxes
just after log_image
, so why not merge them?
I have no hard feelings about each of these implementations, but I really wanted to start the task. Let me know if you see any additional arguments and if we @dberenbaum and @shcheklein come to an agreement.
@AlexandreKempf Without having looked deeply through the PR yet, I'm still fuzzy on how we associate the bounding boxes with the images. An example (whether yolo or something else) will go a long way here and help show the pros and cons and we can decide what works.
@dberenbaum Here is what I had in mind:
log_bounding_boxes
from ultralytics import YOLO
from dvclive import Live
model = YOLO("yolov8n.pt")
with live as Live():
image_path = "https://ultralytics.com/images/bus.jpg"
image_name = "image_bus"
live.log_image(image_name, image)
results = model(image)
format = "tlbr"
bboxes = results[0].boxes.xyxy.numpy()
classes = results[0].boxes.cls.numpy()
class_names = [results[0].names[class_index] for class_index in classes]
scores = results[0].boxes.conf.numpy()
live.log_bounding_boxes(image_name, boxes, class_names, scores, format=format)
# or some dict processing then `live.log_bounding_boxes(image_name, image_path, boxes)`
log_image
from ultralytics import YOLO
from dvclive import Live
model = YOLO("yolov8n.pt")
with live as Live():
image_path = "https://ultralytics.com/images/bus.jpg"
image_name = "image_bus"
results = model(image)
format = "tlbr"
bboxes = results[0].boxes.xyxy.numpy()
classes = results[0].boxes.cls.numpy()
class_names = [results[0].names[class_index] for class_index in classes]
scores = results[0].boxes.conf.numpy()
live.log_image(image_name, image_path, boxes, class_names, scores, format=format)
# or some dict processing then `live.log_image(image_name, image_path, boxes)`
Image and bounding boxes can be matched by image_name
in the Live
object. Once the Python session is over, we can still match them by their path because they should have the same path but different suffixes (like you described here)
Sidenote to @shcheklein. From what I understood of the Yolo W&B logger, they never log the bboxes and the image. They are just saving images with already bounding boxes on top of them (in the pixels I mean), and ultralitycs construct these images. So technically, we could be using this technique already :) Also, the ultralytics documentation on how W&B can display/hide bounding boxes based on their labels is probably not accurate since they don't have the bounding boxes information with ultralytics.
@AlexandreKempf sorry, I meant the Comet ML in this case https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/callbacks/comet.py#L220 . It's the most complete logger for YOLO atm AFAIR.
nd ultralitycs construct these images. So technically, we could be using this technique already :)
we do this already, yep
They are using this call:
experiment.log_image(image_path, name=image_path.stem, step=curr_step, annotations=annotation)
And i like it tbh. It simple, it's clear what is happening. I like also that they are using annotations
- there is a path to expand it beyond just bounding boxes.
I like also that they are using
annotations
- there is a path to expand it beyond just bounding boxes.
Not a strong opinion, but discussed yesterday with @AlexandreKempf that this approach also has its downsides:
annotations
without going to their docsI'm working on that but I won't push until I have something satisfying on the VScode plots. To keep you updated, I went for a solution that should satisfy all of us:
log_image(name, img, bboxes)
The format expected for bboxes
is
{
"boxes": [[1,2,3,4], [5,6,7,8],[10,11,12,13]],
"labels": ["cat", "dog", "boat"]
"scores": [0.1, 0.3, 0.8]
"format": "tlbr"
}
For the argument name (bboxes
or annotations
) we need to pick. It won't change the current PR but it will affect the following ones on segmentation masks. I have a little preference for annotations
because to add the segmentation mask we won't need to duplicate the labels information and it follows ultralytics and torchvision API more closely.
example using bboxes
:
log_image(name, img, bboxes: {"boxes": ..., "labels": ...}, masks: {"masks": ..., "labels": ...})
example using annotations
:
log_image(name, img, annotations: {"boxes": ..., "labels": ..., "masks": ...})
There is no way to know what format to include in annotations without going to their docs
true, but it is the same for bbox - I would have to go to docs to see what is expected
You will have to write additional code to structure your data in that format
yep, here I have no idea how complicated it is compared to that approach
so, no opinion on my end, just a thing to consider ...
@AlexandreKempf hey, is it ready to be reviewed? or is it still a draft ? (can we update the title and description please when you think it's ready to be reviewed)
@shcheklein we are working with @julieg18 to have a working version from DVClive to VScode. But I believe the DVClive side of things is ready to review.
Attention: Patch coverage is 94.28571%
with 8 lines
in your changes are missing coverage. Please review.
Project coverage is 95.31%. Comparing base (
2c7c378
) to head (901ab78
).
Files | Patch % | Lines |
---|---|---|
src/dvclive/plots/annotations.py | 88.88% | 7 Missing and 1 partial :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Some questions I'm still unclear on:
Is there a working version in VS Code? What about handling this in DVC? How can experiments can be compared in VS Code and Studio with this info?
We are currently debugging one with @julieg18 but are close to getting something working perfectly well. I'll update a video of the final result by the end of day to demo how it works. Also, I'm going to open the PR in DVC to add the feature code.
What does a yolo or torchvision example look like with this method?
torchvision & lightning integration looks like this for the user:
class LightningModule(pl.LightningModule):
# ... define `__init__` and `training_step`
def validation_step(self, batch, batch_idxs):
imgs, targets = batch
# inference on validation images
preds = self.forward(imgs)
# log images with bounding boxes
if batch_idxs == 0:
live = self.logger.experiment
for index, img in enumerate(imgs[:15]):
prediction = preds[index]
live.log_image(
f"val_images/{index}/{self.current_epoch}.png",
convert_image_to_np_array(img),
annotations={
"boxes": prediction["boxes"].cpu().numpy().astype(int),
"labels": [
self.class_names[i]
for i in prediction["labels"].cpu().numpy()
],
"scores": np.around(prediction["scores"].cpu().numpy(), 3),
"format": "ltrb",
},
)
return
Note that this saves each image "A", "B", "C" into a structure that looks like this:
images/
A/
0.png
0.json
1.png
1.json
...
B/
C/
Where 0, 1, ... are the validation number (=epoch number if we do validation at every epoch). It allows the step
slider in VSCode and helps to see how the model learns.
For yolo integration, for the user it should looks like this:
from ultralytics import YOLO
from dvclive import Live
model = YOLO("yolov8n.pt")
with live as Live():
image_path = "https://ultralytics.com/images/bus.jpg"
image_name = "image_bus"
results = model(image)
# log image with bounding boxes
format = "tlbr"
boxes = results[0].boxes.xyxy.numpy()
labels_idx = results[0].boxes.cls.numpy()
labels = [results[0].names[idx] for idx in labels_idx]
scores = results[0].boxes.conf.numpy()
live.log_image(
image_name,
image_path,
annotations={
"boxes": boxes,
"labels": labels,
"scores": scores,
"format": format,
},
)
Should we require all fields or should some be optional?
I wondered the same thing. I guess the first iteration should have all the fields needed, and we can always remove some hard constraints as we move on. It is way easier to do it this way than the other way around. If we start with optional fields and realize it is a mistake, it will be harder to revert and force the field. Options I could see, from useful to less useful (IMHO):
I'm fine to move forward with this approach, but let's document the pros and cons once more so we can easily review the thought process in the future:
pros of using
log_bounding_boxes
* I'm afraid we will flood the `load_image` with arguments. First, the bounding boxes, then the polygons/mask representation, then the segmentation... each takes a large `json` like structure and additional information on how to parse it (the `format="tlbr"` for instance).
This is mitigated by using a catchall annotations
kwarg.
* In many cases, you want to see object detection on the validation set (a fixed set of images that are not augmented). This means that you want to see the same image, but you want to see how the bounding boxes evolved as the training goes. With the slider interface provided by our front end, I wanted us to be able to show the different epochs for the same image. That would be a nice feature for object detection users. If we use `log_image` for that, we'll have to save the same image several times. Using a second method, we are free of this problem. It is not a strong argument, as we could save the image every time also and it will still work.
I don't see an easy way to do this regardless of which method we use, but maybe I'm missing something. We would need some way to capture bounding boxes per step.
* It will simplify documentation for the bounding box logging. It will be on another page, so it will be easier to read.
This is still a concern that we can revisit once @AlexandreKempf has drafted a docs PR.
- There is no way to know what format to include in
annotations
without going to their docs
I think this would be easier in log_bbox()
since the IDE could show the individual kwargs for boxes
, labels
, etc. We can partially mitigate this with types and docstrings (see here).
- You will have to write additional code to structure your data in that format
Doesn't look like it makes much difference.
@dberenbaum @shcheklein @skshetry
I used Pydantic at the end to the validation of user inputs. I would love to have this behavior:
class BBox(BaseModel):
boxes: ...
labels: ...
scores: ...
box_format: ...
def my_sexy_function(bbox: BBox):
...
and call it with a dict
my_sexy_function({"boxes":..., "labels":..., "scores":...., "box_format": ...})
But I'm not sure it is possible with mypy.
So to stay consistent with what we said @dberenbaum, I created a typeddict so that users can see what fields are needed and their type. If they are still making a mistakes, pydantic errors should be enough to guide them through the perfect input. I realize it is a bit ugly to have both the TypedDict and the pydantic model. I strongly believe that DVClive should improve the user feeling (more understandable errors and warning, better types and docstrings ...) even if it comes with a maintenance cost on our end.
Note: on the latest implementation Annotations.could_log
is not called. I'm thinking of a better way to integrate it to the code.
Come back to this later
Add bounding boxes
Context and motivations
Other repos PR related:
How to use
The
format
field is used to specify the coordinates system for the bounding box."xywh" means center-x(horizontal), center-y(vertical), width, height They are the 4 more supported types.
:warning: The current PR doesn't support the relative coordinates (coordinates between 0 and 1). But I believe it is easy to add in a future PR.
How it works
The example above will save an image in "dvclive/plots/images/path.png" with
numpy_img
content. It will also create a JSON file alongside the image "dvclive/plots/images/path.json". The JSON file content will look like this:Other than that the PR is ready to review :+1: