@yjxiong Can you explain why there is a different loss function mentioned in the paper as opposed to the normal cross entropy?
From the implementation, I can see you have used normal cross entropy over average predictions from the K shared networks. Am I missing something here?
@yjxiong Can you explain why there is a different loss function mentioned in the paper as opposed to the normal cross entropy? From the implementation, I can see you have used normal cross entropy over average predictions from the K shared networks. Am I missing something here?