Open Strawl opened 1 month ago
The new sLSTM doesn't have the stabilizer state m. This leads to exploding gradients very easily.
Also the normalizer is not implemented?
The new sLSTM doesn't have the stabilizer state m. This leads to exploding gradients very easily.