Closed asdfqwer2015 closed 4 years ago
It depends on what you want, for most scenarios, IoU is enough for evaluation. However, if one wants more ground truth coverage (GTC), in other words, recall, one should focus on GTC. For the next question, actually, they are both not very effective compared to state-of-the-art; this is verified in the paper (just a bit better than RANDOM baseline). From my perspective, the possible number of states for this detection game is just too large for pure RL to work effectively. There are some methods that combine RL and supervised ML models but that's another story. I do not think RL is a good approach under a supervised setting since you have better understanding on the underlying structure (not a reward-driven game). However, this field is still active but I can only recommend you some earlier papers (Reinforcement Learning for Visual Object Detection, Deep reinforcement learning using compositional representations for performing instructions, Active object localization with deep reinforcement learning) as I've quit this area for a long time.
Thank you for your detailed reply and your recommended paper. Really sorry to hear you quit this area. I also do feel that it is difficult to make progress in this direction. I tried to simply train a VGG16 with RL on Cifar10, and can only reach about 45% acc, it's a lot worse than supervised learning. Regardless, it's still hoped to see greater development in this direction in the future.
Hi, qq456cvb: I trained model from scratch and from pre-trained. When I tried to contrast their performance. I can't understand the meaning of curves on tensorboard. Which curve represent the metric we care most, i.e. IOU or IOU+GTC+CD?
And how effective are these ways in your experiments? From scratch, pretrained backbone or fixed pretrained backbone?
Thanks.