The implementation of loss might be wrong

jefflai108 / Contrastive-Predictive-Coding-PyTorch

Contrastive Predictive Coding for Automatic Speaker Verification

MIT License

472 stars 96 forks source link

The implementation of loss might be wrong #7

Closed aayushP closed 4 years ago

aayushP commented 4 years ago

https://arxiv.org/pdf/1807.03748.pdf If you look at equation 4 from the paper, the log softmax would be over N-1 negative samples and 1 positive sample. From your implementation, the N-1 negative samples are actually self.time_step-1. Taking log_softmax over batch seems wrong. We switched it to log_softmax over time and training is more stable and accuracy has gone up for our toy dataset. However that is only a partial fix.

jefflai108 commented 4 years ago

hi @aayushP

You may have interpreted it the equation wrong. The expectation is taken over the mini-batch, which is composed of N-1 neg samples and 1 pos samples. In my case, the neg samples are simply the other samples in the same mini-batch and hence over the "batch". We do this for every time-step t=0, 1, .... and thus there's another for loop, implying ANOTHER expectation over the first expectation. This is not written explicitly in eq 4 but I believe this should be the case.