jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning
Apache License 2.0
682 stars 89 forks source link

cuda failure 700 with nasrnn, and onnx unrecognized attribute with resnext50 #75

Closed lilpenguin closed 3 years ago

lilpenguin commented 3 years ago

Hi, I installed TASO from source with cuda 10.1 and cudnn 7.6.5. The installation was inside an anaconda environment with onnx-1.8.0, python-3.6.12 and protobuf-3.14.0 installed. I was able to run resnet50 and bert under /examples. But for nasrnn, I got the following error:

$ python examples/nasrnn.py
Cuda failure: 700
/path/to/taso/src/cudnn/element_kernel.cu:242
Aborting...

In addition, running resnext50 seemed to finish correctly, but after the new graph was outputted, an onnx error appeared at the end (please see the full output here):

Traceback (most recent call last):
  File "examples/resnext50.py", line 46, in <module>
    onnx.checker.check_model(onnx_model)
  File "/path/to/.conda/envs/onnx/lib/python3.6/site-packages/onnx/checker.py", line 102, in check_model
    C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Unrecognized attribute: split for operator Split

==> Context: Bad node spec: input: "Conv247_fwd0" output: "Split248_fwd0" output: "Split248_fwd1" name: "Split248" op_type: "Split" attribute { name: "axis" i: 1 type: INT } attribute { name: "split" ints: 256 ints: 128 type: INTS }

I ran the examples on a Tesla P100 and a Tesla V100, and in both cases, these errors happened.

Can you please help me solve these issues? Thank you for your help!

xh-yuan commented 3 years ago

This problem Unrecognized attribute: split for operator Split might be solved by downgrading your onnx version to 1.6.0

lilpenguin commented 3 years ago

Downgrading onnx to 1.6.0 worked, thanks! I also switched to the provided docker container, and the cuda error disappeared as well.