Instead of using top1 object score, I would expect to use scores of all object classes, so that each possible triplet would have a score object1relobject2. This would result in a quite large number of scores just for one triplet: 15050150=1,125,000. Then if there are say 20 triplets in the image, this will mean that we will have to rank about 20 mln scores to get topk.
Even though this will slow down evaluation it might be useful, because some object classes are very close to each other, like man and person, so I expect recall to be better if all object classes are considered.
we wanted to standardize the evaluation :) But you're totally right, there's some redundancy in the label space and fixing that would probably improve the performance of both models.
Hi Rowan,
I've noticed that to evaluatie sgcls both in Neural Motifs and Message Passing you only consider top1 object prediction: https://github.com/rowanz/neural-motifs/blob/master/lib/rel_model.py#L536 https://github.com/rowanz/neural-motifs/blob/master/lib/rel_model_stanford.py#L187
Is there any particular reason for that or you just borrowed this way of evaluation from other repo like this one https://github.com/danfeiX/scene-graph-TF-release.
Instead of using top1 object score, I would expect to use scores of all object classes, so that each possible triplet would have a score object1relobject2. This would result in a quite large number of scores just for one triplet: 15050150=1,125,000. Then if there are say 20 triplets in the image, this will mean that we will have to rank about 20 mln scores to get topk. Even though this will slow down evaluation it might be useful, because some object classes are very close to each other, like man and person, so I expect recall to be better if all object classes are considered.
Thanks, Boris