roboflow / supervision

We write your reusable computer vision tools. 💜
https://supervision.roboflow.com
MIT License
17.13k stars 1.31k forks source link

YOLOv8 + ByteTrack integration issues #1320

Open ddrisco11 opened 4 days ago

ddrisco11 commented 4 days ago

Search before asking

Question

Hello! I'm currently building a program to detect deep sea creatures in submarine video. I am using YOLOv8 to make detections and ByteTrack to assign object IDs to these detections. My output includes both an annotated video (based exclusively on YOLO output) and a csv file of all distinct detections (determined as distinct by ByteTrack). I am having an issue where certain creatures are annotated in the video output, ie. detected by YOLO, but then they are omitted from the csv output ie. not assigned a tracking ID by ByteTrack. Please help! Thanks!

Additional

def process_video(video_path: str, output_path: str, model_path: str, location_path: str, start_time: str, time_col: int, lat_col: int, lon_col: int, depth_col: int, salinity_col: int, oxygen_col: int, altitude_col: int, confidence_threshold: float, iou_threshold: float, track_activation_threshold: float, minimum_matching_threshold: float, lost_track_buffer: int, frame_rate: int, min_box_area: int, aspect_ratio_thresh: float): """Process the video to track objects and save tracking data.""" model = YOLO(model_path) tracker = ByteTrack( track_activation_threshold=track_activation_threshold, minimum_matching_threshold=minimum_matching_threshold, lost_track_buffer=lost_track_buffer ) location_data = get_location_data(location_path, time_col, lat_col, lon_col, depth_col, salinity_col, oxygen_col, altitude_col) start_time_seconds = time_to_seconds(start_time)

cap = cv2.VideoCapture(video_path)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path.replace('.csv', '.mp4'), fourcc, fps, (width, height))

tracking_info = {}
pbar = tqdm(total=frame_count, desc='Processing frames', leave=True, mininterval=10)

frame_index = 0
cached_boxes = None
cached_labels = None

try:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        current_time = start_time_seconds + (frame_index / fps)
        lat, lon, depth, salinity, oxygen, altitude = get_location_at_time(location_data, current_time)

        if frame_index % 5 == 0:  # Process frame every 5 frames
            results = process_frame(frame, model, confidence_threshold, iou_threshold)
            cached_boxes = results.boxes.xyxy.numpy()  # Convert to numpy array
            names = model.names  # Class names
            labels = results.boxes.cls.numpy().astype(int)  # Convert to integer labels

            cached_labels = [
                f"{names[label]} {round(confidence, 2)}"
                for label, confidence in zip(labels, results.boxes.conf.numpy())
            ]

        # Draw bounding boxes using cached detections and labels
        annotated_frame = frame.copy()
        if cached_boxes is not None and cached_labels is not None:
            drawn_boxes = set()  # Track drawn boxes
            for box, label in zip(cached_boxes, cached_labels):
                x1, y1, x2, y2 = map(int, box)  # Get box coordinates
                class_name = label.split()[0]  # Get class name from label

                # Check if the box is already drawn
                if (x1, y1, x2, y2) not in drawn_boxes:
                    # Draw rectangle with red color (BGR: (0, 0, 255)) and thicker lines (thickness=3)
                    cv2.rectangle(annotated_frame, (x1, y1), (x2, y2), (0, 0, 255), 3)
                    # Put label text with red color
                    cv2.putText(annotated_frame, class_name, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
                    drawn_boxes.add((x1, y1, x2, y2))

        # Write the frame to the output video
        out.write(annotated_frame)

        if frame_index % 5 == 0:
            detections = sv.Detections.from_ultralytics(results)
            detections = tracker.update_with_detections(detections)

            for index in range(len(detections.class_id)):
                object_id = detections.tracker_id[index]
                class_name = model.names[int(detections.class_id[index])]
                confidence = detections.confidence[index]

                if object_id not in tracking_info:
                    image_path = save_detection_image(frame, detections[index], object_id, current_time, SOURCE_VIDEO_PATH)
                    tracking_info[object_id] = {
                        'Class': class_name,
                        'Confidence': confidence,
                        'Start Time': seconds_to_time_str(int(current_time)),
                        'End Time': seconds_to_time_str(int(current_time)),
                        'Latitude': lat,
                        'Longitude': lon,
                        'Depth': depth,
                        'Salinity': salinity,
                        'Oxygen': oxygen,
                        'Altitude': altitude,
                        'Image Path': image_path,
                        'All Classes': [class_name]
                    }
                else:
                    tracking_info[object_id]['End Time'] = seconds_to_time_str(int(current_time))
                    tracking_info[object_id]['Latitude'] = lat
                    tracking_info[object_id]['Longitude'] = lon
                    tracking_info[object_id]['Depth'] = depth
                    tracking_info[object_id]['Salinity'] = salinity
                    tracking_info[object_id]['Oxygen'] = oxygen
                    tracking_info[object_id]['Altitude'] = altitude
                    tracking_info[object_id]['All Classes'].append(class_name)

        pbar.update(1)
        frame_index += 1
rolson24 commented 4 days ago

Hi @ddrisco11!

This could be caused by several things, could you upload a test video and the model weights to google drive and share the link so we can reproduce this? My initial thought is that the creatures are being detected inconsistently, and the object tracker is struggling to refind the tracks if they have been lost for several frames, but there is no way to verify without using your model on your test video.

Thanks

ddrisco11 commented 3 days ago

Hi @rolson24 , thanks for the response! Here is a link to a short training video, the full code, and the model weights I am using. Please let me know if you need anything else. https://drive.google.com/drive/folders/11u0m7Koew1D7lPEZngvfSR762Rvz0BNr?usp=drive_link

rolson24 commented 3 days ago

Thanks for the code, model weights, and video. I have done a few tests and it looks like there is a few things going on.

First off, it looks like there is a small bug in supervision that makes the tracker_id's skip several numbers. This can be solved by using minimum_consecutive_frames=2 for now. This may be part of your confusion.

The other part is that the tracker relies on high confidence detections to determine if it should create a new track. Most of the detections from your model have confidence values of less than 0.3. Generally a good performing model will have confidence values of around 0.8. The confidence values of the detections greatly affect how the tracker performs because the tracker uses it as a metric of how likely the detection will be detected again in the next frame, and thus if it should be an object to track. To increase the confidence values of the detections you will need more training data. I would recommend adding image augmentations to your existing training data and considering using a more powerful foundation model like DETIC with Roboflow autodistill to automatically label images and then train your smaller yolov8 model on those labeled images.

The final thing you can try is to reduce the minimum_matching_threshold. This parameter determines the minimum threshold of an existing track being matched to a new detection. It essentially combines both the confidence of the detection and how much the track and the detection overlap into one number. By reducing the threshold, you allow the tracker to track objects with lower confidence, but also track detections that happen to overlap with an existing track that corresponds to a different object. By reducing the minimum_matching_threshold you risk tracks switching between different objects, but you may be able to track lower confidence detections. This is unlikely though, and I would first recommend improving the performance of your object detector.

ddrisco11 commented 1 day ago

@rolson24 thanks for taking the time to help out! This was my first time asking a question on GitHub and I very much appreciate the support.