Multi-box in one person after change a slighter backbone

shupinghu commented 4 years ago

Hello, I have changed the DLA34 backbone to a slighter backbone so that we can do MOT faster, but after I changed the backbone and retrained it using the given trainning dataset, multi-box was showned in one person when running the demo.py (in case crowds walking through the camera in the meanwhile, people farther away from the camera are completely blocked by people nearby; This does not happen when a single person or a sparse crowd passes by the camera), do you know what's the problem? Maybe the nms would help? I could not find the nms process that used in the track/demo.py, could you please tell me where is it? 00139

ifzhang commented 4 years ago

We do not use nms because we use a max pooling operation to select the top heatmap center. You can use higher max pooling kernel like 5x5 and 7x7. Also, you can use a higher --conf_thres and it will help. It is quite strange to appear lots of boxes like this. Do you have an upsampling operation in your backbone? I think our network will perform well when the output feature of backbone is 1/4 resolution of the input image.

shupinghu commented 4 years ago

We do not use nms because we use a max pooling operation to select the top heatmap center. You can use higher max pooling kernel like 5x5 and 7x7. Also, you can use a higher --conf_thres and it will help. It is quite strange to appear lots of boxes like this. Do you have an upsampling operation in your backbone? I think our network will perform well when the output feature of backbone is 1/4 resolution of the input image.

Thanks for your quick reply. I have used the upsampling operation in my backbone and the output feature of backbone is 1/4 resolution of the input image. I can only find the "post_process" operation in multitracker.py line 194, could you tell me where and how you used the max pooling kernel?

ifzhang commented 4 years ago

https://github.com/ifzhang/FairMOT/blob/bcc0e3f9812e8ab75e855a58e1c0c5cd0138fe01/src/lib/models/decode.py#L9

shupinghu commented 4 years ago

https://github.com/ifzhang/FairMOT/blob/bcc0e3f9812e8ab75e855a58e1c0c5cd0138fe01/src/lib/models/decode.py#L9

Hi, I have tried this method on the trained model (I think I do not need to re-retrain it with the maxpooling kernel size changed), but the situation has not improved. I noticed that in the centernet code, nms can be selected to join in the post-processing, have you preserve this part?

songbo0925 commented 4 years ago

I met the same problem with you. How did you solve it later?

shupinghu commented 4 years ago

I met the same problem with you. How did you solve it later?

I simply add an external nms module to solve it. But I still don't know what the reason is that caused this problem.

songbo0925 commented 4 years ago

I met the same problem with you. How did you solve it later?

I simply add an external nms module to solve it. But I still don't know what the reason is that caused this problem.

Thanks for your reply.

I think it may be because the network models the pedestrian target as a point, and NMS (the kernel is 3 × 3, very small) on the output feature map does not suppress the points representing the same person. However, if we simply expand the size of the kernel, when two pedestrian targets are close to each other, the other target can be easily suppressed, which is not what we want to see. Therefore, it is necessary to add another NMS processing based on IOU between bounding boxes. For adding additional NMS, can you share your NMS code?

In addition, I'd like to ask if you're running your training set video in the demo.py , When I run my training set video in the demo.py,this phenomenon did not appear, but run the test video in the demo.py there will be this phenomenon. I don't know if it has been fitted?

shupinghu commented 4 years ago

this

I met the same problem with you. How did you solve it later?

I simply add an external nms module to solve it. But I still don't know what the reason is that caused this problem.

Thanks for your reply.

I think it may be because the network models the pedestrian target as a point, and NMS (the kernel is 3 × 3, very small) on the output feature map does not suppress the points representing the same person. However, if we simply expand the size of the kernel, when two pedestrian targets are close to each other, the other target can be easily suppressed, which is not what we want to see. Therefore, it is necessary to add another NMS processing based on IOU between bounding boxes. For adding additional NMS, can you share your NMS code?

In addition, I'd like to ask if you're running your training set video in the demo.py , When I run my training set video in the demo.py,this phenomenon did not appear, but run the test video in the demo.py there will be this phenomenon. I don't know if it has been fitted?

You can refer to the centernet code to add the nms module.

I also think that the not fitted network training caused this problem, but the printed loss has already converged, so I guess that maybe the slight backbone do not have enough ability in this mission?

songbo0925 commented 4 years ago

this

I met the same problem with you. How did you solve it later?

I simply add an external nms module to solve it. But I still don't know what the reason is that caused this problem.

Thanks for your reply. I think it may be because the network models the pedestrian target as a point, and NMS (the kernel is 3 × 3, very small) on the output feature map does not suppress the points representing the same person. However, if we simply expand the size of the kernel, when two pedestrian targets are close to each other, the other target can be easily suppressed, which is not what we want to see. Therefore, it is necessary to add another NMS processing based on IOU between bounding boxes. For adding additional NMS, can you share your NMS code? In addition, I'd like to ask if you're running your training set video in the demo.py , When I run my training set video in the demo.py,this phenomenon did not appear, but run the test video in the demo.py there will be this phenomenon. I don't know if it has been fitted?

You can refer to the centernet code to add the nms module.

I also think that the not fitted network training caused this problem, but the printed loss has already converged, so I guess that maybe the slight backbone do not have enough ability in this mission?

Maybe, but I'm using the dla34 backbone.

KevinKai-0717 commented 3 years ago

this

I met the same problem with you. How did you solve it later?

I simply add an external nms module to solve it. But I still don't know what the reason is that caused this problem.

Thanks for your reply. I think it may be because the network models the pedestrian target as a point, and NMS (the kernel is 3 × 3, very small) on the output feature map does not suppress the points representing the same person. However, if we simply expand the size of the kernel, when two pedestrian targets are close to each other, the other target can be easily suppressed, which is not what we want to see. Therefore, it is necessary to add another NMS processing based on IOU between bounding boxes. For adding additional NMS, can you share your NMS code? In addition, I'd like to ask if you're running your training set video in the demo.py , When I run my training set video in the demo.py,this phenomenon did not appear, but run the test video in the demo.py there will be this phenomenon. I don't know if it has been fitted?

You can refer to the centernet code to add the nms module.

I also think that the not fitted network training caused this problem, but the printed loss has already converged, so I guess that maybe the slight backbone do not have enough ability in this mission?

Hello, I really want to ask, in which file should NMS operations be added?

Maping1026 commented 3 years ago

@shupinghu and @songbo0925 Hello，I refer to the centernet code to add the nms module in src\lib\trains\mot.py,

but it doesn't work for my big object detection and tracking, can you share your nms ？ thank you very much~

ifzhang / FairMOT

Multi-box in one person after change a slighter backbone #110