evaluation code for recall @k

Thanks for your excellent work and the nice code. When I read your evaluation code in sg_eval.py, I found something different from other people's code for visual relationship detection.

#  Line114-119 in sg_eval.py
for k in result_dict[mode + '_recall']:
     match = reduce(np.union1d, pred_to_gt[:k])
     rec_i = float(len(match)) / float(gt_rels.shape[0])
     result_dict[mode + '_recall'][k].append(rec_i)

This code section seems to caculate the recall@k through each image, and then get the final recall@k performance by averaging over all the tested images

# Line 37-40 in sg_eval.py
def print_stats(self):
        print('======================' + self.mode + '============================')
        for k, v in self.result_dict[self.mode + '_recall'].items():
            print('R@%i: %f' % (k, np.mean(v)))

I summary the step as that:

get recall@k for each image, we denote it as R=[r1,r2,r3,....,rN]
finall recall performance = (r1+r2+...+rN)/N=np.mean(R)

However, in the first paper of VRD "Visual Relationship Detection with Language Priors", it calculate the recall as follow:

get the number of correct detected relationships on top-K results on each image, denote as correct_rel=[c1,...,cN]. The number of gt relationship is denoted as gt_rel=[g1,...,gN]
recall@k = (c1+c2+...+cN)/(g1+...+gN) Some other works having published codes also calculate the recall@k in this way, such as MSDN(Line362-376), FactorizableNet(Line123-154).

I am not sure whether I understand your code correctly, or I miss something. Could you tell me whether your code does in the same way? If it is in a different one, why? Thanks a lot.

rowanz / neural-motifs

evaluation code for recall @k #66