Closed wtliao closed 5 years ago
My evaluation code is the same as xu et al's, which is the past work that I compare against in the paper. you might be right that different people use a different set of evaluation metrics, which makes comparing different approaches difficult (we noted the same in our supplemental material)
Thanks for your excellent work and the nice code. When I read your evaluation code in sg_eval.py, I found something different from other people's code for visual relationship detection.
This code section seems to caculate the recall@k through each image, and then get the final recall@k performance by averaging over all the tested images
I summary the step as that:
However, in the first paper of VRD "Visual Relationship Detection with Language Priors", it calculate the recall as follow:
I am not sure whether I understand your code correctly, or I miss something. Could you tell me whether your code does in the same way? If it is in a different one, why? Thanks a lot.