Import ONNX LSTM converted from PyTorch

SamHSlva commented 2 years ago

Ubuntu 18, Python 3.6.13, OpenCV

Detailed description

I am trying to import a simple LSTM network converted from Pytorch to ONNX. The model imports and executes perfectly in both PyTorch and ONNX. When I try to import in OpenCV I get an error.

Node [LSTM]:(81) parse error: OpenCV(4.5.4) /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp:463: error: (-5:Bad argument) Blob 79 not found in const blobs in function 'getBlob'

I've submitted this question in OpenCV forum, and got a reply from a moderator, suggesting I should post it here.

Steps to reproduce

class LayerLSTM(nn.Module):
    def __init__(self):
        super(LayerLSTM, self).__init__()
        self.rnns = nn.LSTM(156, 512, 1, batch_first=False)

    def forward(self, x, hx, cx):
        x, (hx, cx) = self.rnns(x, (hx,cx))
        return x

model = LayerLSTM()

with torch.no_grad():
    x = torch.randn(1, 1, 156)
    hx = torch.randn(1, 1, 512)
    cx = torch.randn(1, 1, 512)
    out = model(x, hx, cx)
torch.onnx.export(model, (x, hx, cx), '/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx', verbose=True, input_names=['x', 'hx', 'cx'], output_names=['output'])
# sess = onnxruntime.InferenceSession('/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx')
# out_on = sess.run(None, {'x': x.cpu().numpy(), 'hx': hx.numpy(), 'cx': cx.numpy()})
# print(out.numpy() - out_on[0])

net = cv2.dnn.readNetFromONNX('/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx')

Issue submission checklist

[X ] I report the issue, it's not a question
[ X ] I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found solution
[ X] I updated to latest OpenCV version and the issue is still there
[ X] There is reproducer code and related data files: videos, images, onnx, etc

asmorkalov commented 2 years ago

@SamHSlva thanks for the report. Please provide more information about OpenCV, your system setup and the model you tries to load including cv.getBuildInformation() output. ONNX tensors could be filed with random values or zeros. See https://github.com/opencv/opencv/wiki/OpenCV-Debugging-Facilities for advanced options.

SamHSlva commented 2 years ago

Hi, thank you for the reply. I have created this toy example just to simplify my bigger problem, and the error is basically the same. The full torch model and the conversion is as follows:

import torch
import torch.nn as nn
import cv2

class LayerLSTM(nn.Module):
 def __init__(self):
     super(LayerLSTM, self).__init__()
     self.rnns = nn.LSTM(156, 512, 1, batch_first=False)

 def forward(self, x, hx, cx):
     x, (hx, cx) = self.rnns(x, (hx,cx))
     return x

model = LayerLSTM()

with torch.no_grad():
 x = torch.randn(1, 1, 156)
 hx = torch.randn(1, 1, 512)
 cx = torch.randn(1, 1, 512)
 out = model(x, hx, cx)
torch.onnx.export(model, (x, hx, cx), 'toy_lstm.onnx', verbose=True, input_names=['x', 'hx', 'cx'], output_names=['output'])
# sess = onnxruntime.InferenceSession('/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx')
# out_on = sess.run(None, {'x': x.cpu().numpy(), 'hx': hx.numpy(), 'cx': cx.numpy()})
# print(out.numpy() - out_on[0])

net = cv2.dnn.readNetFromONNX('toy_lstm.onnx')

The Output of this script is the following:

/home/user/anaconda3/envs/DFS_2/bin/python /home/user/DVS_Original/deep-stabilization/pytorch_LSTM_toy.py
/home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py:2174: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. 
  "or define the initial states (h0/c0) as inputs of the model. ")
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
graph(%x : Float(1, 1, 156, strides=[156, 156, 1], requires_grad=0, device=cpu),
      %hx : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=0, device=cpu),
      %cx : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=0, device=cpu),
      %49 : Float(1, 2048, 156, strides=[319488, 156, 1], requires_grad=0, device=cpu),
      %50 : Float(1, 2048, 512, strides=[1048576, 512, 1], requires_grad=0, device=cpu),
      %51 : Float(1, 4096, strides=[4096, 1], requires_grad=0, device=cpu)):
  %7 : Tensor? = prim::Constant() # /home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/nn/modules/rnn.py:710:0
  %28 : Float(1, 1, 1, 512, strides=[512, 512, 512, 1], device=cpu), %29 : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=1, device=cpu), %30 : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=1, device=cpu) = onnx::LSTM[hidden_size=512](%x, %49, %50, %51, %7, %hx, %cx) # /home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/nn/modules/rnn.py:710:0
  %output : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=1, device=cpu) = onnx::Squeeze[axes=[1]](%28) # /home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/nn/modules/rnn.py:710:0
  return (%output)

[ERROR:0] global /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp (718) handleNode DNN/ONNX: ERROR during processing node with 7 inputs and 3 outputs: [LSTM]:(28)
Traceback (most recent call last):
  File "/home/user/DVS_Original/deep-stabilization/pytorch_LSTM_toy.py", line 27, in <module>
    net = cv2.dnn.readNetFromONNX('toy_lstm.onnx')
cv2.error: OpenCV(4.5.4) /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp:739: error: (-2:Unspecified error) in function 'handleNode'
> Node [LSTM]:(28) parse error: OpenCV(4.5.4) /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp:463: error: (-5:Bad argument) Blob hx not found in const blobs in function 'getBlob'

This is the output of the requested command:

General configuration for OpenCV 4.5.4 =====================================
  Version control:               4.5.4-dirty

  Platform:
    Timestamp:                   2021-11-19T16:25:28Z
    Host:                        Linux 5.11.0-1021-azure x86_64
    CMake:                       3.22.0
    CMake generator:             Unix Makefiles
    CMake build tool:            /bin/gmake
    Configuration:               Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (15 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (30 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      NO
    C++ standard:                11
    C++ Compiler:                /usr/lib/ccache/compilers/c++  (ver 10.2.1)
    C++ flags (Release):         -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/lib/ccache/compilers/cc
    C flags (Release):           -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -L/root/ffmpeg_build/lib  -Wl,--gc-sections -Wl,--as-needed  
    Linker flags (Debug):        -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -L/root/ffmpeg_build/lib  -Wl,--gc-sections -Wl,--as-needed  
    ccache:                      YES
    Precompiled headers:         NO
    Extra dependencies:          /lib64/libopenblas.so Qt5::Core Qt5::Gui Qt5::Widgets Qt5::Test Qt5::Concurrent /lib64/libpng.so /lib64/libz.so dl m pthread rt
    3rdparty dependencies:       libprotobuf ade ittnotify libjpeg-turbo libwebp libtiff libopenjp2 IlmImf quirc ippiw ippicv

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python3 stitching video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java python2 ts
    Applications:                -
    Documentation:               NO
    Non-free algorithms:         NO

  GUI:                           QT5
    QT:                          YES (ver 5.15.0 )
      QT OpenGL support:         NO
    GTK+:                        NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /lib64/libz.so (ver 1.2.7)
    JPEG:                        libjpeg-turbo (ver 2.1.0-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         /lib64/libpng.so (ver 1.5.13)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.4.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES
      avcodec:                   YES (58.91.100)
      avformat:                  YES (58.45.100)
      avutil:                    YES (56.51.100)
      swscale:                   YES (5.7.100)
      avresample:                NO
    GStreamer:                   NO
    v4l/v4l2:                    YES (linux/videodev2.h)

  Parallel framework:            pthreads

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   /tmp/pip-req-build-w88qv8vs/_skbuild/linux-x86_64-3.6/cmake-build/3rdparty/ippicv/ippicv_lnx/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                /tmp/pip-req-build-w88qv8vs/_skbuild/linux-x86_64-3.6/cmake-build/3rdparty/ippicv/ippicv_lnx/iw
    VA:                          NO
    Lapack:                      YES (/lib64/libopenblas.so)
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.5.1)

  OpenCL:                        YES (no extra features)
    Include path:                /tmp/pip-req-build-w88qv8vs/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 /opt/_internal/cpython-3.6.15/bin/python (ver 3.6.15)
    Libraries:                   libpython3.6m.a (ver 3.6.15)
    numpy:                       /tmp/pip-build-env-ut15bkrp/overlay/lib/python3.6/site-packages/numpy/core/include (ver 1.13.3)
    install path:                python/cv2/python-3.6

  Python (for build):            /bin/python2.7

  Java:                          
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /tmp/pip-req-build-w88qv8vs/_skbuild/linux-x86_64-3.6/cmake-install
-----------------------------------------------------------------

asmorkalov commented 2 years ago

Model diagnostic tool output:

./opencv_model_diagnostics -m=./toy_lstm.onnx 
[ERROR:0] global /home/alexander/Projects/opencv/modules/dnn/src/onnx/onnx_importer.cpp (700) handleNode DNN/ONNX: Potential problem during processing node with 7 inputs and 3 outputs: [LSTM]:(28)
OpenCV(4.5.4-dev) /home/alexander/Projects/opencv/modules/dnn/src/onnx/onnx_importer.cpp:463: error: (-5:Bad argument) Blob hx not found in const blobs in function 'getBlob'

asmorkalov commented 2 years ago

The issue is reproduced with OpenCV 4.5.4 on Ubuntu 18.04. OpenCV expects hx as constant tensor, but not as input tensor.

asmorkalov commented 2 years ago

toy_lstm.onnx.zip

leochan2009 commented 2 years ago

Any progress? I have same issue with LSTM model converted from Pytorch to Onnx

asmorkalov commented 2 years ago

WIP:

asmorkalov commented 1 year ago

Status for current 4.x branch:

./bin/opencv_model_diagnostics -m=toy_lstm.onnx 
[ERROR:0@0.017] global /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp (1033) handleNode DNN/ONNX: Potential problem during processing node with 7 inputs and 3 outputs: [LSTM]:(onnx_node_output_0!28) from domain='ai.onnx'
OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp:1582: error: (-215:Assertion failed) shape(blob) == blobShape in function 'lstm_extractConsts'

[ERROR:0@0.018] global /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp (1045) handleNode DNN/ONNX: Layer of type LSTM(LSTM) cannot be created with parameters depth : 5
has_dynamic_shapes : 0
hidden_size : 512
. Error: OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/recurrent_layers.cpp:162: error: (-215:Assertion failed) blobs.size() >= 3 in function 'LSTMLayerImpl'

asmorkalov commented 1 year ago

@Abdurrahheem The parser issue is still there:

./opencv_model_diagnostics -m=toy_lstm.onnx 
[ERROR:0@0.016] global onnx_importer.cpp:1039 handleNode DNN/ONNX: Potential problem during processing node with 7 inputs and 3 outputs: [LSTM]:(onnx_node_output_0!28) from domain='ai.onnx'
OpenCV(4.7.0-dev) /home/alexander/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp:1623: error: (-215:Assertion failed) shape(blob) == blobShape in function 'lstm_extractConsts'

[ERROR:0@0.016] global onnx_importer.cpp:1051 handleNode DNN/ONNX: Layer of type LSTM(LSTM) cannot be created with parameters depth : 5
has_dynamic_shapes : 0
hidden_size : 512
. Error: OpenCV(4.7.0-dev) /home/alexander/Projects/OpenCV/opencv-master/modules/dnn/src/layers/recurrent_layers.cpp:162: error: (-215:Assertion failed) blobs.size() >= 3 in function 'LSTMLayerImpl'

Abdurrahheem commented 1 year ago

Not able to reproduce on branch lstm_fix_initialization locally. Are you sure you have run the command it on that branch or on the 4.x?

Abdurrahheem commented 1 year ago

The reason you see get this error might be because of the toy_lstm.onnx file you are using. Here is it onnx graph

And here is a graph generated using python script mentioned above

Abdurrahheem commented 1 year ago

This type of ONNX LSTM graph (where weight matrixes are defined as inputs to LSTM layer) is not supported by OpenCV currently for this reason

AND

This type of graph (where hidden states matrixes are defined as inputs to LSTM layer) is fully supported and was added in this PR. Please use export_params=True when exporting your model in torch or onnx to avoid the first scenario

asmorkalov commented 1 year ago

Test for the mentioned case:https://github.com/opencv/opencv/pull/23545

opencv / opencv