Torch's softplus is parameterized by a number, beta, that (I think) controls how closely the function approximates ReLU. I propose we raise an error if this parameter is set (to anything other than 1). The other option would be to support this in our SoftPlus implementation. From a quick scan, Keras and Flux don't have this parameter, so I think we probably don't need to support it either.
Torch's softplus is parameterized by a number, beta, that (I think) controls how closely the function approximates ReLU. I propose we raise an error if this parameter is set (to anything other than 1). The other option would be to support this in our SoftPlus implementation. From a quick scan, Keras and Flux don't have this parameter, so I think we probably don't need to support it either.