roboflow / notebooks

Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.
https://roboflow.com/models
4.89k stars 759 forks source link

Filtering boudning boxes in image annotation using SAM and grounding DINO #110

Closed sivaji123256 closed 1 year ago

sivaji123256 commented 1 year ago

Hi @hansent @tonylampada @yeldarby @RobertoNovelo , Thanks for the great work on image annotation. I was trying to filter out the bounding boxes by area on the output of the detections which is supervisior detection format. I was able to convert that into an numpy array. But, how to convert that numpy array back into supervisor detection class? Following is the small code :

# detect objects
detections = grounding_dino_model.predict_with_classes(
    image=image,
    classes=enhance_class_name(class_names=CLASSES),
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)

#print(type(detections))

detections = np.array(detections)
det1 = []
for k in range(len(detections)):
    x1,y1,x2,y2 = detections[k][0][0],detections[k][0][1],detections[k][0][2],detections[k][0][3]
    x_diff = abs(x2-x1)
    y_diff = abs(y2-y1)
    if x_diff<250 and y_diff<150 :
        det1.append(detections[k])

detections = np.array(det1) 

# annotate image with detections
box_annotator = sv.BoxAnnotator()
labels = [
    #f"{CLASSES[class_id]} {confidence:0.2f}"
    f"{CLASSES[class_id]} {confidence:0.2f}" if class_id is not None else f"{'other'} {confidence:0.2f}"
    for _, _, confidence, class_id, _ 
    in detections]
annotated_frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)

Any suggestions would be highly useful.

github-actions[bot] commented 1 year ago

👋 Hello @sivaji123256, thank you for leaving an issue on Roboflow Notebooks.

🐞 Bug reports

If you are filing a bug report, please be as detailed as possible. This will help us more easily diagnose and resolve the problem you are facing. To learn more about contributing, check out our Contributing Guidelines.

If you require support with custom code that is not part of Roboflow Notebooks, please reach out on the Roboflow Forum or on the GitHub Discussions page associated with this repository.

💬 Get in touch

Do you have more questions about Roboflow that we haven't responded to yet? Feel free to ask them on the Roboflow Discuss forum. Our developer advocates and community team actively respond to questions there.

To ask questions about Notebooks, head over to the GitHub Discussions section of this repository.

SkalskiP commented 1 year ago

Hi, @sivaji123256 👋🏻!

I think you can do it much easier. You have two options.

  1. If you want to filter by area. Here is the docs link.
detections = detections[detections.area > AREA_TRESHOLD]
  1. If you want to filter by box dimensions, the solution is a bit more hacky but quite concise.
w = detections.xyxy[:, 2] - detections.xyxy[:, 0]
h = detections.xyxy[:, 3] - detections.xyxy[:, 1]
detections = detections[(w > WIDTH_TRESHOLD) & (h > HEIGHT_TRESHOLD)]

You put that filtering part under grounding_dino_model.predict_with_classes call.

detections = grounding_dino_model.predict_with_classes(
    image=image,
    classes=enhance_class_name(class_names=CLASSES),
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)

<<HERE>>

box_annotator = sv.BoxAnnotator()
labels = [
    f"{CLASSES[class_id]} {confidence:0.2f}" if class_id is not None else f"{'other'} {confidence:0.2f}"
    for _, _, confidence, class_id, _ 
    in detections]
annotated_frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)
SkalskiP commented 1 year ago

I'm closing the issue. Feel free to reopen in case you'll have more questions.