Open chist opened 3 years ago
@chist fall for the same problem. did you figure out a solution?
I can convert one-layer LSTM and multilayer LSTMs to tensorflow (onnx 1.7.0, onnx-tf 1.7.0 ). However, the results between onnx and onnx-tf are not equal when using multi-layer LSTM, while the results between onnx and onnx-tf when using single-layer LSTM are the same.
@chist have you figured out a solution?
I probably faced the same error when was trying to load several instances of LSTM model in one session I believe that the problem is that rnn_cell is a global class variable
And when you create several instances of LSTM, rnn_cell is getting initialized only once
I suspect this is not an expected behavior. @chinhuang007 Can you take a look please?
The bug is still happening. I cannot run the exported model with multilayer lstm.
Error:
InvalidArgumentError: Matrix size-incompatible: In[0]: [219,512], In[1]: [336,1024]
[[{{node LSTM_2ab5f684/rnn/while/body/_58/LSTM_ad6f857e/rnn/while/rnn/multi_rnn_cell/cell_0/lstm_cell/BiasAdd}}]] [Op:__inference___call___2981]
Function call stack:
__call__
We can reproduce this bug in a colab environment (both GPU and CPU), here is the reproduce code:
!pip install onnx-tf
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
import onnx
import onnx_tf
from onnx_tf.backend import prepare
import numpy as np
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
G = nn.LSTM(input_size=80,
hidden_size=256,
num_layers=3, # bug at 3
dropout=0,
bidirectional=False,
batch_first=True)
G = G.to(device)
aus = torch.Tensor(np.zeros((219,18,80))).to(device)
torch.onnx.export(G,
args=(aus),
f="audio_G.onnx",
input_names=["au"],
output_names=["a","b","c"],
opset_version=12)
model = onnx.load('audio_G.onnx')
tf_rep = prepare(model)
o,w,e = tf_rep.run((aus.cpu()))
Describe the bug
I need to convert my PyTorch model to Tensorflow. For this purpose I use PyTorch —> ONNX —> Tensorflow approach.
However, I get the following error message when trying to run a prepared tensorflow model:
ValueError: Dimensions must be equal, but are 6 and 7 for 'rnn/multi_rnn_cell/cell_0/lstm_cell/MatMul' (op: 'MatMul') with input shapes: [50,6], [7,12].
The problem appears when the initial PyTorch LSTM module has more than one layer:
If there is only one layer, everything works fine.
To Reproduce
Here's the full code:
ONNX model file
https://drive.google.com/file/d/1kaCryP-My7_I4Gd2BS37uKUzYB6MkHaG/view?usp=sharing
Python, ONNX, ONNX-TF, Tensorflow version
(this is the only configuration I managed to export with)
Additional context
As I understand, such problems usually arise when the same cells are used for different LSTM layers instead of creating new ones: https://stackoverflow.com/a/48796202
Moreover, the problem doesn't disappear if I split multilayer LSTM into several one-layer LSTMs.
So, I believe that it is connected to the lines 34–37 in
rnn_mixing.py
: https://github.com/onnx/onnx-tensorflow/blob/c63d4351c7752a769cdc9a1bfcf79ffd140e0e6a/onnx_tf/handlers/backend/rnn_mixin.py#L34-L37I also have to note that the problem concerns the
onnx-tensorflow
module because conversion from PyTorch to ONNX and backwards (usingonnxruntime
) works as expected.