zhongkaifu / Seq2SeqSharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Other
193 stars 38 forks source link

Setting FocalLossGamma = 2 causes weight corruption in the beginning of the seq2seq model training #73

Open zsogitbe opened 8 months ago

zsogitbe commented 8 months ago

Description of the bug: Setting FocalLossGamma = 2 in a sequence to sequence model training causes weight corruption in the beginning of the training and the training stops (weight corruption checking feature added recently). It is a non-random error which is always causing weight corruption in the beginning of the training. If FocalLossGamma is set to 0 the error does not happen.