levan92 / deep_sort_realtime

A really more real-time adaptation of deep sort
MIT License
164 stars 50 forks source link

Deep sort remembering bad state #43

Closed filipp01 closed 1 year ago

filipp01 commented 1 year ago

I have flask API that takes base64 image from the client, converts it to jpeg and does yolov5 recognition on it. It's purpose is to detect humans/bottles. I was testing on a bottle first to see how it will behave.

saveImage = ''

data = request.get_json()
base64_str = data.get('image', '')
if not base64_str:
    return jsonify({'error': 'Invalid or missing image data.'}), 400

try:
    image_data = base64.b64decode(base64_str)
    image = Image.open(BytesIO(image_data))

    results = model(image)

    results.ims 
    rezultati = results.render()

    frame1 = np.ones((1080, 1088, 3), dtype=np.uint8) * 255

    bounding_boxes = results.xyxy[0].cpu().numpy()
    converted_detections2 = []

    bounding_box_deepsort = []

    for detection in results.xyxy[0].cpu().numpy():  # Iterate through all detections in the first image
        # Each detection contains [x1, y1, x2, y2, confidence, class]
        x1, y1, x2, y2, confidence, class_id = detection
        #print(f"Bounding box coordinates: x1={x1}, y1={y1}, x2={x2}, y2={y2}")
        arrayCords = [x1, y1, x2-x1, y2-y1]
        print(detection)
        bounding_box_deepsort.append((arrayCords, confidence, class_id))

    print(bounding_box_deepsort)
    #print(bounding_boxes)

    img_base64 = ''
    newImage = ''
    for img in results.ims:
        buffered = BytesIO()
        img_base64 = Image.fromarray(img)
        img_base64.save(buffered, format="JPEG")
        saveImage = base64.b64encode(buffered.getvalue()).decode('utf-8')
        buffered.seek(0)
        newImage = Image.open(buffered)

    tracks = tracker.update_tracks(
        bounding_box_deepsort, frame=frame1
    )
    draw = ImageDraw.Draw(newImage)

    for track in tracks:
        tlwh = track.to_tlwh()

        #print("id: ", track.track_id)
        #print("tracks: ", tlwh)
        x1, y1, w, h = tlwh
        x2, y2 = x1 + w, y1 + h
        draw.rectangle([x1, y1, x2, y2], outline="red", width=3)

    buffered = BytesIO()
    newImage.save(buffered, format="JPEG")
    saveImage = base64.b64encode(buffered.getvalue()).decode('utf-8')

    print("---------------------------------------")
except Exception as e:
    print(str(e))
    print(jsonify({'error': f'An error occurred while processing the image: {str(e)}'}))
    return jsonify({'error': f'An error occurred while processing the image: {str(e)}'}), 500

return jsonify({'image': saveImage}), 200

The way i utilize tracker is as follows GPU = torch.cuda.is_available() and not os.environ.get("USE_CPU") embedder = "mobilenet" embeds = None today = datetime.now().date() tracker = DeepSort( max_age=30 )

The issue is that i place a bottle in front of me and the bounding boxes i get are following:

1

After I move my camera away from the bottle, it still shows the bounding box.... Like following 1a

This also happens when camera looks at our street and then at one point, humans become like ghosts, bounding box aka DeepSort tracker shows bounding box despite them not being there... For example when truck passes them and it loses focus for a second, DeepSort shows 2 objects (1 reall human and 1"ghost"). I don't understand why is this happening? Is this something settings wise and how can i fix it?

I want to say that my camera is displaying 2-3 FPS if this matters anything at all.

Best regards

levan92 commented 1 year ago

You should be using only confirmed tracks using track.is_confirmed() (see README).

Also, after no raw corresponding detection, existing confirmed tracks will continue to do Kalman predictions on the bounding box states up to a max_age. You may set these parameters when you initialise the DeepSort object.

filipp01 commented 1 year ago

image

I'm using now track.is_confirmed() and max_age of 30. Its better, once i move my camera away, bounding boxes dissapear after 1-2 seconds unlike 5-6 before. I'm using following code now for skipping the boxes.

if not track.is_confirmed() or track.time_since_update > 5: continue

Now as you can see, the issue is still that sometimes the bounding box will just become ghost and this happens a lot when more people are on the street and once it loses focus. Could it be that this is because of 1-2FPS that my Yolo is working on right now? Do you think if I had more FPS that it would maybe have more precision or this is something else that is a problem? @levan92 Thanks for response!

levan92 commented 1 year ago

Sorry for the late response, yup DeepSORT tend to work better when the FPS is higher. If you do not want the Kalman predicted states, you can just either Track.to_ltrb(self, orig=True, orig_strict=True) to always get boxes that are from your detection model or also filter away those tracks when track.time_since_update > 0 (meaning no associated detection this round).