I am looking through your code and I see that for the supersived implementation you use the MSE loss instead of the cross-entropy loss like you mentioned in the paper... There is no condition to use the cross-entropy loss when supervised and it always use the mse loss. Is it a mistake or is it done on purpose?
Hi,
I am looking through your code and I see that for the supersived implementation you use the MSE loss instead of the cross-entropy loss like you mentioned in the paper... There is no condition to use the cross-entropy loss when supervised and it always use the mse loss. Is it a mistake or is it done on purpose?