kemaloksuz / RankSortLoss

Official PyTorch Implementation of Rank & Sort Loss for Object Detection and Instance Segmentation [ICCV2021]
Apache License 2.0
239 stars 26 forks source link

RankSort loss computation over batch #6

Closed xonobo closed 3 years ago

xonobo commented 3 years ago

At this line the predictions are flattened discarding the batch dimension. vectorize_labels also do the same. I guess the ranking and sorting of positives/negatives are done across image instances in the batch. Is this done intentional? I think instance-wise RankSort loss computation is more meaningful for the batch case.

kemaloksuz commented 3 years ago

Yes, it was an intentional design choice. The main reason is that the previous work on ranking-based losses suggest that the more example ranking-based losses have, the better approximation you get. More particularly, a previous work (https://arxiv.org/pdf/1912.03500.pdf -- see Section 4.3) introduced "score memory", in which they keep the examples from previous batches in order to increase the number of examples and ensure better approximation. Furthermore, if ranking-based losses (AP, aLRP or RS Losses) are called for each image in the batch, then training time will probably increase (need to be validated) since they are more expensive compared to score-based losses (e.g. cross-entropy). Note that even with one call per batch, it takes 1.5 times longer to train models on average.

There can be different choices in the implementation: Rank&Sort can be computed over all instances in the batch (as we did for better approximation and more efficient training time), over single images (as you suggested), over instances in each FPN level in each single image or even maybe for each ground truth etc. However, we have not validated this design choice thoroughly, and I am not sure which design choice is more meaningful. Why do you think image-wise implementation is more meaningful?

xonobo commented 3 years ago

Thanks for the explanation. In the paper, the losses are said to be distributed over other negative/positive pairs. Assuming images in the batch are independent, it sounds meaningful to compute primary terms image-wise regarding this independency. However better approximation and speed gain may validate your approach. Sorry for the late response due to vacations.

Bu arada bu guzel calisma icin tebrik ve tesekkur ederim :)