Closed lminer closed 5 years ago
Because Softmax is only applied to the final dimension, and it should normally be applied only to the channel dimension.
Nowadays, Softmax supports multidim input and defaults to the last axis, but in earlier versions this was not the case.
Thank you! Great explanation.
I was wondering if you had a sense why you need to do a reshape here before applying the softmax.