Softmax, LogSoftmax are over parameterized

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

https://pytorch.org

Other

81.98k stars 21.99k forks source link

Softmax, LogSoftmax are over parameterized #78450

Open chiragnagpal opened 2 years ago

chiragnagpal commented 2 years ago

🚀 The feature, motivation and pitch

The current implementation of Softmax and LogSoftmax for a K class output involves an [NxK] input tensor. This is over parameterized and leads to issues of identifiability. It would be worthwhile to include an implementation that only requires an input tensor [NX(K-1)] dimensionality.

Alternatives

No response

Additional context

No response

cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345

jbschlosser commented 2 years ago

Hey @chiragnagpal, thanks for the request! Is the idea to accept an input of shape (N, K-1) and treat the final element as an implicit zero to produce a softmax output of shape (N, K)?

If so, I'm not sure we want to support this directly - we'd prefer to stick with the mathematical definition of softmax. I'm not aware of any precedent in other frameworks for supporting this, and it should be relatively easy to work around by adding the zero element explicitly.