xg-chu / CrowdDet

[CVPR 2020] Detection in Crowded Scenes: One Proposal, Multiple Predictions
MIT License
424 stars 85 forks source link

Label assignment in multi-classes prediction #13

Open zehuichen123 opened 4 years ago

zehuichen123 commented 4 years ago

Hi, I am trying to implement CrowdDet on my own. However, I wonder what's your strategy when handling multiple class detection? In your code fpn_roi_target.py, I notice that you simply take the top-2 iou label as the target for each proposal, so maybe they can be two different classes?

xg-chu commented 4 years ago

The multiple class detection will be implemented in a few days. Please wait for the new implementation.

zehuichen123 commented 4 years ago

Could you please simply describe your strategy so I can give it a try on my own? Thanks!

zehuichen123 commented 4 years ago

@Purkialo Another question, for refine module, did you repeat the bbox data 4 times for coco or simply one time, since in crowdhuman dataset, you repeated it 4 times which convert it to a 20 dim array. I tried to concatenate one 400(80 * (4 + 1)) dim features to original roi feature but ended up with NaN loss :(

xg-chu commented 4 years ago

We recommend that you only use the simple version of emd without refinement module. If you want to use the refinement module, I think you only need to concat the feature, and don’t repeat the features.

LearnerZhou commented 4 years ago

Hi, similar with @zehuichen123, I am quite confused about why did you repeat the bbox data 4 times in refine model. Hope for your answer. Thanks a lot!

zehuichen123 commented 4 years ago

Here is my guess: the original feature dim is 1024 and if we only append coordinates to the roi feature, only 4 + 1 dim vector will be appended, which means little influence on refinement results(5 vs 1024), so maybe repeating it multiple times is better, I think 4 here is an empirical value.

LearnerZhou commented 4 years ago

Wow, it sounds reasonable. Thanks for your warm help!

xg-chu commented 4 years ago

Here is my guess: the original feature dim is 1024 and if we only append coordinates to the roi feature, only 4 + 1 dim vector will be appended, which means little influence on refinement results(5 vs 1024), so maybe repeating it multiple times is better, I think 4 here is an empirical value.

👍

taofuyu commented 2 years ago

So, can one anchor predict two different classes?

abhigoku10 commented 2 years ago

@Purkialo has the implementation of a single bounding box and multiple labels implemented? can u share the code @taofuyu i am also interested in this did u see any other repos like this

taofuyu commented 2 years ago

@Purkialo has the implementation of a single bounding box and multiple labels implemented? can u share the code @taofuyu i am also interested in this did u see any other repos like this

I think one anchor can predict multi classes, but it is not necessary. Because NMS is applied to classes one by one, if there are different-class objects predicted in one set, Set-NMS will be invalid

abhigoku10 commented 2 years ago

@taofuyu ya the NMS will be invalid but what i looks for is multi label classification with single bounding box , i want the nms operation btw the different anchor boxes and its classes and not within them