zhongkaifu / Seq2SeqSharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Other
193 stars 38 forks source link

Exception: 'The weight '.LayerNorm' has been released, you cannot access it.' #59

Closed piedralaves closed 1 year ago

piedralaves commented 1 year ago

Hi

When we set the parameter "encoderType" to "BiLSTM" an exception arises:

'The weight '.LayerNorm' has been released, you cannot access it.'

In fact, when we use "Transformer" in both encoder and decoder, everything works fine. However, when we try to set the parameter to "AttentionLSTM" as decoder or ""BiLSTM" as encoder, the exception arises.

What does the exception mean?

Thanks a lot

G

zhongkaifu commented 1 year ago

Hi @piedralaves ,

It's a bug related to weights release on some operators (only used by LSTM type models) during training. I already fixed it and you can check out this file from the repo: https://github.com/zhongkaifu/Seq2SeqSharp/blob/master/Seq2SeqSharp/Tools/ComputeGraphTensor.cs

Thanks Zhongkai Fu

piedralaves commented 1 year ago

Dear Zhongkai, is it a big change? I cannot see what you have done. I tried to substitute ComputeGraphTensor but many errors arise. Sorry.

zhongkaifu commented 1 year ago

It's a minor change. Here is the diffs: https://github.com/zhongkaifu/Seq2SeqSharp/commit/e3de9c6fb83caff1d9aff103f7937509871e605f#diff-4101e3779b113596c6c988619cf726c9a04d15f795e3f7ec886ae9ad96d4ec89

You can check these diffs and modify your local file.

piedralaves commented 1 year ago

Are the changes only in ComputeGraphTensor.cs?

zhongkaifu commented 1 year ago

Yes, only in CompteGraphTensor.cs

piedralaves commented 1 year ago

Thanks.

clm33 commented 1 year ago

The code that you posted solved the problem except for the condition in which the encoder is BiLSTM and the Decoder is Transformer. The other combinations work perfectly fine, so we where wondering whether you could know what may be the issue with this encoder as BiLSTM and Decoder as Transformer condition. The error that arises is the following:

Exception: 'Output tensor must have the same number of elements as the input. Size = 3720 300 , New Size = 186 20 600 '

I attach the log so that you can explore it if you want to.

I have searched through the posts to find whether someone had the same issue and found one in which the problem was that the "MultiHeadNum" should be divisible by the "HiddenSize" parameter, but that has not solved the issue.

Again, we would appreciate a lot if you could shed some light on this issue.

Seq2SeqConsole_Train_2023_02_08_12h_04m_26s.log

zhongkaifu commented 1 year ago

Hi @clm33 ,

The reason is that BiLSTM output concatenates hidden layer (forward and backward) on the top of the network, so its dimension becomes "2 * hidden_dim" which is different value with the dim at decoder, and then decoder failed.

I just made a check-in that changing BiLSTM output from "concatenates" to "add" mode, so "BiLSTM + Transformer" is working now. Let me know if you have any questions.

Thanks Zhongkai Fu

piedralaves commented 1 year ago

Hi Zhongkai: Could you specify the change, please?

zhongkaifu commented 1 year ago

It's all in this commit: https://github.com/zhongkaifu/Seq2SeqSharp/commit/7723cc10c2501384381db3de27722e3ffdc283cf