Closed piedralaves closed 1 year ago
Hi @piedralaves ,
It's a bug related to weights release on some operators (only used by LSTM type models) during training. I already fixed it and you can check out this file from the repo: https://github.com/zhongkaifu/Seq2SeqSharp/blob/master/Seq2SeqSharp/Tools/ComputeGraphTensor.cs
Thanks Zhongkai Fu
Dear Zhongkai, is it a big change? I cannot see what you have done. I tried to substitute ComputeGraphTensor but many errors arise. Sorry.
It's a minor change. Here is the diffs: https://github.com/zhongkaifu/Seq2SeqSharp/commit/e3de9c6fb83caff1d9aff103f7937509871e605f#diff-4101e3779b113596c6c988619cf726c9a04d15f795e3f7ec886ae9ad96d4ec89
You can check these diffs and modify your local file.
Are the changes only in ComputeGraphTensor.cs?
Yes, only in CompteGraphTensor.cs
Thanks.
The code that you posted solved the problem except for the condition in which the encoder is BiLSTM and the Decoder is Transformer. The other combinations work perfectly fine, so we where wondering whether you could know what may be the issue with this encoder as BiLSTM and Decoder as Transformer condition. The error that arises is the following:
Exception: 'Output tensor must have the same number of elements as the input. Size = 3720 300 , New Size = 186 20 600 '
I attach the log so that you can explore it if you want to.
I have searched through the posts to find whether someone had the same issue and found one in which the problem was that the "MultiHeadNum" should be divisible by the "HiddenSize" parameter, but that has not solved the issue.
Again, we would appreciate a lot if you could shed some light on this issue.
Hi @clm33 ,
The reason is that BiLSTM output concatenates hidden layer (forward and backward) on the top of the network, so its dimension becomes "2 * hidden_dim" which is different value with the dim at decoder, and then decoder failed.
I just made a check-in that changing BiLSTM output from "concatenates" to "add" mode, so "BiLSTM + Transformer" is working now. Let me know if you have any questions.
Thanks Zhongkai Fu
Hi Zhongkai: Could you specify the change, please?
It's all in this commit: https://github.com/zhongkaifu/Seq2SeqSharp/commit/7723cc10c2501384381db3de27722e3ffdc283cf
Hi
When we set the parameter "encoderType" to "BiLSTM" an exception arises:
'The weight '.LayerNorm' has been released, you cannot access it.'
In fact, when we use "Transformer" in both encoder and decoder, everything works fine. However, when we try to set the parameter to "AttentionLSTM" as decoder or ""BiLSTM" as encoder, the exception arises.
What does the exception mean?
Thanks a lot
G