The loss function in calculating equation 3 in Paper Section 3.2 with funtion __get_relative_prob in Class: LocalAggregationLossModule

neuroailab / LocalAggregation-Pytorch

88 stars 17 forks source link

The loss function in calculating equation 3 in Paper Section 3.2 with funtion __get_relative_prob in Class: LocalAggregationLossModule #5

Open walkingwindy opened 3 years ago

walkingwindy commented 3 years ago

Hello, I'm curious about the function __get_relative_prob in Class: LocalAggregationLossModule. Specifically, why you set 'keepdim = True' in function torch.sum conducted on back_nei_probs, while, 'keepdim = False' (default setting) in function torch.sum conducted on relative_probs. I think that these settings make the feature dimensions different between the numerator and denominator when calculating equation 3 in Paper Section 3.2, which specifically, the dimension of numerator is [batchsize] and the dimension of denominator is [batchsize, 1]. Is my understanding correct？And why you set 'keepdim = True' or what if we set 'keepdim = False' for both numerator and denominator?

walkingwindy commented 3 years ago

def __get_relative_prob(self, all_close_nei, back_nei_probs): relative_probs = tf.reduce_sum( tf.where( all_close_nei, x=back_nei_probs, y=tf.zeros_like(back_nei_probs), ), axis=1) relative_probs /= tf.reduce_sum(back_nei_probs, axis=1) return relative_probs Ahh, I find a tf version of the same loss function, where 'keepdims = None' both in numerator and denominator. SO, is there a mistake in the pytorch version?

chengxuz commented 3 years ago

Thanks for noticing this. But I believe it is a problem that will not influence the results, the pytorch version automatically broadcast the dimension to make the difference not influence the results. As we have said, the pytorch version has been verified, the models trained should be very similar to the tensorflow version.

walkingwindy commented 3 years ago

Thanks for replying. In your implementation, the dimension of numerator is [batchsize] and the dimension of denominator is [batchsize, 1], the broadcasting will result in [batchsize, batchsize] instead of [batchsize, 1]. In my opinion, this loss function can be formulated as 1/N * \sum_{j=1}^{N}(-log\frac{P(Ci & Bi | Vi)}{P(Bj | Vi)}) where N is the batchsize, which is different from equation 3 in Paper Section 3.2.