nwojke / deep_sort

Simple Online Realtime Tracking with a Deep Association Metric
GNU General Public License v3.0
5.34k stars 1.49k forks source link

incorect tracking on video file #13

Closed obendidi closed 7 years ago

obendidi commented 7 years ago

Hello, I've managed to run the tracker correctly with video as input, and I'm using YOLOv2 for generating box detections,but I got really bad results for the tracking, here is the snipet of the code I'm using :

metric = nn_matching.NearestNeighborDistanceMetric(
    "cosine", 0.2, 100)
tracker = Tracker(metric)
encoder = generate_detections.create_box_encoder(
        "resources/networks/mars-small128.ckpt-68577")

camera = cv2.VideoCapture(file)
while camera.isOpened():
        _, frame = camera.read()
        if frame is None:
            print ('\nEnd of Video')
            break
      h, w, _ = frame.shape
    thick = int((h + w) // 300)
##################Yolo part to generate detections #####
        detections = []
        scores = []
        boxes = self.sess.run(self.out, feed_dict)
        for b in boxes:
                left, right, top, bot, mess, max_indx, confidence = boxResults(b)
                detections.append(np.array([left,top,right-left,bot-top]))
                scores.append(float('%.2f' % confidence))
        detections = np.array(detections)
##################################################
        features = encoder(frame, detections)
        detections = [Detection(bbox, score, feature) for bbox,score, feature in
                        zip(detections,scores, features)]

         # Run non-maxima suppression.
         boxes = np.array([d.tlwh for d in detections])
         scores = np.array([d.confidence for d in detections])
         indices = prep.non_max_suppression(boxes, nms_max_overlap, scores)
         detections = [detections[i] for i in indices]

         tracker.predict()
         tracker.update(detections)

         for track,det in zip(tracker.tracks,detections):

              bbox = track.to_tlbr()
          cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])),
                            (255,255,255), thick)
          #cv2.putText(frame, str(track.track_id),
                                         #(int(bbox[0]), int(bbox[1]) - 12),0, 1e-3 * h, (255,255,255),thick//3)

          bbox = det.to_tlbr()
          cv2.rectangle(frame,(int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])),
                            (255,0,0), thick)
              cv2.putText(frame, str(track.track_id),
                                                (int(bbox[0]), int(bbox[1]) - 12),0, 1e-3 * h, (255,0,0),thick//3)
        cv2.imshow('', frame)

Here is the the results I got after testing, the ID's were not stable , and the white boxes (generated by the tracker) were always thin (width =0) : video in google drive (in blue are yolo detections , and white output of the tracker ) can someone please help find where is the problem , and a possible solution ? thank you !

obendidi commented 7 years ago

I've found that track.is_confirmed() returns a False most of the time.. it is probably the source of the problem

nwojke commented 7 years ago

Hi bendidi,

in your visualization you go over tracks and detections in one loop

for track,det in zip(tracker.tracks,detections):
    # Visualization

but the number of detections and tracks will usually not be the same. Also, the i-th detection at time k is by no means associated with the i-th track at time k. Use two separate loops to visualize detections and tracks as done here. You might find the OpenCV visualization useful for your purposes as well. This class takes care of only visualizing confirmed tracks.

If this is not just a visualization error, then I'd guess something is wrong with the Kalman state. Double check that the width coming out of the filter really is 0 as suggested by your visualization and check velocities (e.g., print object states). If this doesn't help it would be useful if you can dump detections that you input to the tracker.

Edit: Also, double-check the bounding box format coming out of the detector is correct.

obendidi commented 7 years ago

I've added the visualization of detection just to make sure that the bounding box coming of the detector is correct, about the format, the boxes that come from the detector are: (left, right, top, bot)==(xmin,xmax,ymin,ymax) and I change it to : (left, top, right-left, bot-top) == (x,y,w,h)

the width coming out of the tracker is really 0, I've checked with prints :

('is_confirmation', False)
('is_tentative', True)
('is_deleted', False)
('frame', (1024, 1280, 3))
('tracking', array([641,  40,   0,  97]))

('is_confirmation', False)
('is_tentative', True)
('is_deleted', False)
('frame', (1024, 1280, 3))
('tracking', array([615, 197,   0, 126]))

('is_confirmation', False)
('is_tentative', True)
('is_deleted', False)
('frame', (1024, 1280, 3))
('tracking', array([859, 486,   0, 234]))

('is_confirmation', False)
('is_tentative', True)
('is_deleted', False)
('frame', (1024, 1280, 3))
('tracking', array([531, 536,   0, 257]))

('is_confirmation', False)
('is_tentative', True)
('is_deleted', False)
('frame', (1024, 1280, 3))
('tracking', array([1002,  580,    0,  157]))

('is_confirmation', False)
('is_tentative', True)
('is_deleted', False)
('frame', (1024, 1280, 3))
('tracking', array([1084,  725,    0,  204]))

('is confirmation', True)
('is_tentative', False)
('is_deleted', False)
('frame', (1024, 1280, 3))
('tracking', array([ 988.55564239,  672.82243722,  117.2925128 ,  117.2925128 ]))

frame is the shape of the image and tracking is output box of the tracker in (x,y,w,h) format

Thank you for your help !

chengangfzu commented 7 years ago

thank you for your work!

chengangfzu@foxmail.com

From: Ouail Date: 2017-07-20 16:50 To: nwojke/deep_sort CC: Subscribed Subject: Re: [nwojke/deep_sort] incorect tracking on video file (#13) I've added the visualization of detection just to make sure that the bounding box coming of the detector is correct, about the format, the boxes that come from the detector are: (left, right, top, bot)==(xmin,xmax,ymin,ymax) and I change it to : (left, top, right-left, bot-top) == (x,y,w,h) the width coming out of the tracker is really 0, I've checked with prints : ('confirmation', False) ('frame', (1024, 1280, 3)) ('tracking', array([641, 40, 0, 97])) ('confirmation', False) ('frame', (1024, 1280, 3)) ('tracking', array([615, 197, 0, 126])) ('confirmation', False) ('frame', (1024, 1280, 3)) ('tracking', array([859, 486, 0, 234])) ('confirmation', False) ('frame', (1024, 1280, 3)) ('tracking', array([531, 536, 0, 257])) ('confirmation', False) ('frame', (1024, 1280, 3)) ('tracking', array([1002, 580, 0, 157])) ('confirmation', False) ('frame', (1024, 1280, 3)) ('tracking', array([1084, 725, 0, 204])) ('confirmation', True) ('frame', (1024, 1280, 3)) ('tracking', array([ 988.55564239, 672.82243722, 117.2925128 , 117.2925128 ]))

confirmation represents : track.is_confirmed() frame is the shape of the image and tracking is output box of the tracker in (x,y,w,h) format Thank you for your help ! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

nwojke commented 7 years ago

My next step in debugging this would be to set the --max_cosine_distance parameter to a very large number to see how the Kalman filter performs (if it still produces 0 width tracks and no confirmation).

bhavikajalli commented 7 years ago

I am facing the same problem. Let me know if you figure it out.

nwojke commented 7 years ago

Can one of you dump the detections that you input to the tracker (including the appearance descriptor) in the format described on the project page in native numpy format? The behavior is hard to debug without a working example.

lg-code-repo commented 7 years ago

I also get the width to be 0 when using the yolo2 detection. The bbox of your detection result must be with a type of float, the default type is Int type. I get a correct result when I change this. Hopely it helps.But I get a very slow speed with yolo2 detection.

nwojke commented 7 years ago

Right, integer types may break the computations inside the Kalman filter. I have committed a fix that forces floating point types here. A pull from the current master should fix this.

obendidi commented 7 years ago

thank you @kinginsky @nwojke that was really the problem , I had to cast the detection to float, the result that I got is better than the sort algorithm but not by much, and I noticed another problem --I don't know if it's normal or not -- but the tracking boxes Width and Height is half of that of the detection's by YOLO, you can find the output video that i tested on here , PS : YOLO detection's are in red , deep sort detection are in white

so does this influence the tracking or it has no relation, and if so what are the possible ways to improve the tracking ? thank you again :)

nwojke commented 7 years ago

If your boxes have wrong size then there is a good chance that the appearance descriptor is also computed from the wrong bounding boxes. This could degrade performance.

obendidi commented 7 years ago

any idea of why this is happening ? I've put my code in here if interested

obendidi commented 7 years ago

it seems that there is a problem in this line

it should be :

ret[2:] = ret[:2] + ret[2:]

instead of :

ret[2:] = ret[:2] + ret[2:] / 2

I get correct tracking right now , thanks for the help :+1:

nwojke commented 7 years ago

Good catch. It seems as if Track.to_tlbr() is not used inside the deep_sort code itself. The visualization in application_util uses Track.to_tlwh() so it went unnoticed. I have comitted a fix.

ghost commented 7 years ago

Does it run in near realtime?

nwojke commented 7 years ago

@harshmunshi03 the tracker runs in realtime if you have a decent graphics card for feature generation.

groverpr commented 6 years ago

@bendidi I tried darkflow with deep-sort for tracking, using your repo. Fora video with a decent number of people in the scene (20-30), it is performing even worse than YOLO v2 with simple IOU based clustering. How were your final results from this?

tonmoyborah commented 6 years ago

@bendidi I looked at your warehouse video, it seems there are a lot of ID switches even without any occlusion. Did you find a solution for this?

Qidian213 commented 6 years ago

I combined deep_sort and yolov3 ,it can tracking real-time use my laptop's camera

https://github.com/Qidian213/deep_sort_yolov3