roboflow / supervision

We write your reusable computer vision tools. 💜
https://supervision.roboflow.com
MIT License
24.17k stars 1.8k forks source link

ultralytics_stream_example do not work #1152

Closed tgbaoo closed 6 months ago

tgbaoo commented 6 months ago

Search before asking

Bug

I have run the ultralytics_stream_example file for time in zone example, but nothing happen shown as result tracking, despite the deprecated decorector and 2 bug:

  1. when passing frame.image to the ultralytics to get the result detection, it must be frame[0].image. (fixed)
  2. Is when pass detection to custom sink on_prediction method. (Not be fixed yet)

Please check them out.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

LinasKo commented 6 months ago

Hi @tgbaoo :wave:

Thank you for reporting the issue!

I see you're willing to contribute a PR to help us out. I'll assign the issue to you - let me know if you need any help :wink:

tgbaoo commented 6 months ago

@LinasKo Please check my bug and help me with the second bug that need to be fixed!

LinasKo commented 6 months ago

There's a little bit in your explanation that I don't understand. Could you please share a Colab showing where the error appears?

tgbaoo commented 6 months ago

Sure @LinasKo, here is more information about my problem: First bug that I mention above should be fixed like this:

    def inference_callback(frame: VideoFrame) -> sv.Detections:
        results = model(frame[0].image, verbose=True, conf=confidence, device=device)
        results_0 = results[0]
        print("result: ", results_0)
        detections = sv.Detections.from_ultralytics(results_0).with_nms(threshold=iou)
        print("[ultralytics] detections: ", detections)
        print("[ultralytics] detections class_id attr: ", detections.class_id)

        return detections

full code from ultralytics_stream_example.py for more context:

class CustomSink:
    def __init__(self, zone_configuration_path: str, classes: List[int]):
        self.classes = [classes] if isinstance(classes, int) else classes
        self.tracker = sv.ByteTrack(minimum_matching_threshold=0.8)
        self.fps_monitor = sv.FPSMonitor()
        self.polygons = load_zones_config(file_path=zone_configuration_path)
        self.timers = [ClockBasedTimer() for _ in self.polygons]
        self.zones = [
            sv.PolygonZone(
                polygon=polygon,
                triggering_anchors=(sv.Position.CENTER,),
            )
            for polygon in self.polygons
        ]

    def on_prediction(self, detections: sv.Detections, frame: VideoFrame) -> None:
        self.fps_monitor.tick()
        fps = self.fps_monitor.fps

        detections = detections[find_in_list(detections.class_id, self.classes)]
        detections = self.tracker.update_with_detections(detections)
        annotated_frame = frame.image.copy()
        annotated_frame = sv.draw_text(
            scene=annotated_frame,
            text=f"{fps:.1f}",
            text_anchor=sv.Point(40, 30),
            background_color=sv.Color.from_hex("#A351FB"),
            text_color=sv.Color.from_hex("#000000"),
        )
        for idx, zone in enumerate(self.zones):
            annotated_frame = sv.draw_polygon(
                scene=annotated_frame, polygon=zone.polygon, color=COLORS.by_idx(idx)
            )
            detections_in_zone = detections[zone.trigger(detections)]
            time_in_zone = self.timers[idx].tick(detections_in_zone)
            custom_color_lookup = np.full(detections_in_zone.class_id.shape, idx)
            annotated_frame = COLOR_ANNOTATOR.annotate(
                scene=annotated_frame,
                detections=detections_in_zone,
                custom_color_lookup=custom_color_lookup,
            )
            labels = [
                f"#{tracker_id} {int(time // 60):02d}:{int(time % 60):02d}"
                for tracker_id, time in zip(detections_in_zone.tracker_id, time_in_zone)
            ]
            annotated_frame = LABEL_ANNOTATOR.annotate(
                scene=annotated_frame,
                detections=detections_in_zone,
                labels=labels,
                custom_color_lookup=custom_color_lookup,
            )
        cv2.imshow("Processed Video", annotated_frame)
        cv2.waitKey(1)
def main(
    rtsp_url: str,
    zone_configuration_path: str,
    weights: str,
    device: str,
    confidence: float,
    iou: float,
    classes: List[int],
) -> None:
    model = YOLO(weights)
    def inference_callback(frame: VideoFrame) -> sv.Detections:
        results = model(frame[0].image, verbose=True, conf=confidence, device=device)
        results_0 = results[0]
        print("result: ", results_0)
        detections = sv.Detections.from_ultralytics(results_0).with_nms(threshold=iou)
        print("[ultralytics] detections: ", detections)
        print("[ultralytics] detections class_id attr: ", detections.class_id)
        return detections
    sink = CustomSink(zone_configuration_path=zone_configuration_path, classes=classes)
    pipeline = InferencePipeline.init_with_custom_logic(
        video_reference=rtsp_url,
        on_video_frame=inference_callback,
        on_prediction=sink.on_prediction,
    )

    pipeline.start()
    try:
        pipeline.join()
    except KeyboardInterrupt:
        pipeline.terminate()

the log I get when run the code:

SupervisionWarnings: __call__ is deprecated: FPSMonitor.__call__ is deprecated and will be removed in supervision-0.22.0. Use FPSMonitor.fps instead.
0: 384x640 4 persons, 1 truck, 1 handbag, 1 tv, 942.1ms
Speed: 8.0ms preprocess, 942.1ms inference, 8.5ms postprocess per image at shape (1, 3, 384, 640)
result:  ultralytics.engine.results.Results object with attributes:
boxes: ultralytics.engine.results.Boxes object
keypoints: None
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
obb: None
orig_img: array([[[ 91,  91,  91],
        [111, 111, 111],
        [115, 115, 115],
        ...,
        [135, 135, 135],
        [125, 125, 125],
        [ 53,  53,  53]],
       [[ 85,  85,  85],
        [103, 103, 103],
        [110, 110, 110],
        ...,
        [135, 135, 135],
        [125, 125, 125],
        [ 53,  53,  53]],
       [[ 76,  76,  76],
        [ 91,  91,  91],
        [101, 101, 101],
        ...,
        [135, 135, 135],
        [125, 125, 125],
        [ 53,  53,  53]],
       ...,
       [[ 89, 102,  96],
        [ 89, 102,  96],
        [ 89, 102,  96],
        ...,
        [233, 194, 180],
        [233, 194, 180],
        [109,  70,  56]],
       [[ 89, 102,  96],
        [ 89, 102,  96],
        [ 89, 102,  96],
        ...,
        [233, 194, 180],
        [233, 194, 180],
        [109,  70,  56]],
       [[ 89, 102,  96],
        [ 89, 102,  96],
        [ 89, 102,  96],
        ...,
        [233, 194, 180],
        [233, 194, 180],
        [109,  70,  56]]], dtype=uint8)
orig_shape: (720, 1280)
path: 'image0.jpg'
probs: None
save_dir: 'runs\\detect\\predict'
speed: {'preprocess': 8.001565933227539, 'inference': 942.1312808990479, 'postprocess': 8.53729248046875}
[ultralytics] detections:  Detections(xyxy=array([[     695.06,      61.616,      795.59,      310.53],
       [     878.28,      132.54,      959.91,      369.58],
       [     1035.3,      27.613,      1105.2,      191.51],
       [     59.808,      385.06,      335.94,         643],
       [     519.25,      150.87,      593.72,      385.25],
       [     1099.2,      348.75,      1154.4,      431.54],
       [     612.75,      110.03,      695.27,       217.1]], dtype=float32), mask=None, confidence=array([    0.89195,     0.82298,     0.69308,     0.60449,      0.5467,     0.38043,     0.33792], dtype=float32), class_id=array([ 0,  0,  0,  7,  0, 26, 62]), tracker_id=None, data={'class_name': array(['person', 'person', 'person', 'truck', 'person', 'handbag', 'tv'], dtype='<U7')})
[ultralytics] detections class_id attr:  [ 0  0  0  7  0 26 62]

[04/29/24 18:45:46] WARNING  Error in results dispatching - 'tuple' object has no attribute 'class_id'                                            inference_pipeline.py:892
SupervisionWarnings: __call__ is deprecated: FPSMonitor.__call__ is deprecated and will be removed in supervision-0.22.0. Use FPSMonitor.fps instead.
SkalskiP commented 6 months ago

Hi @tgbaoo 👋🏻 I just run ultralytics_stream_example.py and it runs without any issue. This is how I run it:

python ultralytics_stream_example.py \
--zone_configuration_path "data/checkout/config.json" \
--rtsp_url "rtsp://localhost:8554/live0.stream" \
--weights "yolov8x.pt" \
--device "mps" \
--classes 0 \
--confidence_threshold 0.5 \
--iou_threshold 0.5

It's true that deprecation warnings appear for every frame, which can be quite annoying, but the code still functions correctly.

SupervisionWarnings: __call__ is deprecated: `FPSMonitor.__call__` is deprecated and will be removed in `supervision-0.22.0`. Use `FPSMonitor.fps` instead.
tgbaoo commented 6 months ago

Dear @LinasKo and @SkalskiP Maybe the problem come from the new version of roboflow or I clone the new PR?

I run on window with CPU, the CLI same as you setup.

I have check the roboflow inference documentation and see different implementation of custom sink with before and after 0.9.18 version.

Screenshot_20240430_022014.jpg

Screenshot_20240430_022008.jpg

Sorry because I am newbie on computer vision, from your expertise and experience, which direction should I investigate for debugging this code?

Thanks for you passionate and for your patient to help me out.

SkalskiP commented 6 months ago

Hi @tgbaoo 👋🏻 you discovered exactly what I did over the past 30 minutes. Looks like this issue is related to inference 0.9.22 release that introduced breaking changes in the API. I just merged the PR that pinned the version of inference to to maximum 0.9.21 so if you delete your Python environment and and install once again from current version of develop branch you should be fine.

You can probably also:

pip uninstall inference
pip install `inference==0.9.21`

I'm closing the issue, but if you will have more questions let us know.

tgbaoo commented 6 months ago

I must give a big thanks to @LinasKo and @SkalskiP a lot, you guys are so 🔥🔥🔥 You guys just saved My day 🚀🚀🚀