plemeri / InSPyReNet

Official PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (ACCV 2022)
MIT License
449 stars 69 forks source link

Unable to convert the model to onnx #5

Closed rakesh-reddy95 closed 1 year ago

rakesh-reddy95 commented 1 year ago

I'm unable to convert the model to onnx and seems the L1 loss is not supported in opset15 of onnx. Have you faced any such issue. Also may I know how you converted the model to torchscript (.pt) from pytorch file(.pth)

plemeri commented 1 year ago

Hello rakesh,

I never used onnx, but I actually used torchscript before and implemented for this repository, but after some commits, I think it is not working properly. It was meant to be working when you use --jit argument for run/Inference.py but it seems not working for now.

I'll check this functionality and fix it as soon as possible.

Thank you

rakesh-reddy95 commented 1 year ago

Thanks I'm able to do it now to onnx and torchscript is working after making few changes. But facing few issues in computing the tensorflow computational graph. Seems somewhere strides and the dilation both are set to >1 which is not supported .

plemeri commented 1 year ago

Glad it's working now. For the dilation larger than 1, it seems to be a PAA Module from UACANet. If you need any help, please don't hesitate to contact me.

tk4218 commented 1 year ago

Thanks I'm able to do it now to onnx and torchscript is working after making few changes. But facing few issues in computing the tensorflow computational graph. Seems somewhere strides and the dilation both are set to >1 which is not supported .

@rakesh-reddy95 would you be willing to share your onnx conversion code? I am looking to do the same and am running into some issues as well.

rakesh-reddy95 commented 1 year ago

@tk4218 what are the issues? Were you able to trace the model?

tk4218 commented 1 year ago

@tk4218 what are the issues? Were you able to trace the model?

I am able to run Inference.py with --jit as long as the input shape matches the base_size for the model. However, when I try to convert the model to ONNX I and getting an exit code 0xC0000094. I've tried converting both the torch model as well as the torchscript version and get the same error.

Here's the conversion code I am running, along with the output I am getting:

Conversion script:

import torch.onnx
import onnx
from onnxsim import simplify

from lib.InSPyReNet import InSPyReNet_SwinB
from utils.misc import Simplify

model = InSPyReNet_SwinB(64, False, [384, 384], threshold=512)
model.load_state_dict(torch.load("./../snapshots/InSPyReNet_SwinB/latest.pth"))

model.cuda()
model.eval()

model = Simplify(model)
model = torch.jit.trace(model, torch.rand(1, 3, 384, 384).cuda(), strict=False)

data = torch.rand(1, 3, 384, 384).cuda()
torch_out = model(data)

output_file = "E:/Background/InSPyReNet/latest.onnx"
torch.onnx.export(model,
                  data,
                  output_file,
                  opset_version=11,
                  verbose=True)

onnx_model = onnx.load(output_file)
onnx_model, check = simplify(onnx_model)
assert check, "Simplified ONNX model could not be validated"
onnx.save(onnx_model, output_file)

Output:

C:\Users\tkoon\AppData\Roaming\Python\Python37\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3191.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\InSPyReNet.py:152: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  elif (H <= self.threshold or W <= self.threshold):
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:428: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if W % self.patch_size[1] != 0:
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:430: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if H % self.patch_size[0] != 0:
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:366: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  Hp = int(np.ceil(H / self.window_size)) * self.window_size
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:367: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  Wp = int(np.ceil(W / self.window_size)) * self.window_size
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:203: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert L == H * W, "input feature has wrong size"
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:62: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  B = int(windows.shape[0] / (H * W / window_size / window_size))
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:241: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_r > 0 or pad_b > 0:
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:274: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert L == H * W, "input feature has wrong size"
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:279: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  pad_input = (H % 2 == 1) or (W % 2 == 1)
C:\Users\tkoon\PycharmProjects\InSPyReNet\lib\backbones\SwinTransformer.py:280: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_input:
C:\Users\tkoon\AppData\Roaming\Python\Python37\site-packages\torch\onnx\utils.py:823: UserWarning: no signature found for <torch.ScriptMethod object at 0x000001A3CFBC0E28>, skipping _decide_input_format
  warnings.warn(f"{e}, skipping _decide_input_format")

Process finished with exit code -1073741676 (0xC0000094)
rakesh-reddy95 commented 1 year ago

Not sure on the reason, but I used the torchscript Simplified file and used below to export.

torch.onnx.export( model, # PyTorch Model torch.rand(1, 3, 224, 224).cuda(), # Input tensor 'MiniSwinT.onnx', # Output file (eg. 'output_model.onnx') opset_version=16, # Operator support version export_params=True, input_names=['input'], output_names = ['output'], )

tk4218 commented 1 year ago

@rakesh-reddy95 I've tried running this same code to no avail. Do you know what versions of ONNX and PyTorch you are running? I've tried cloning the repository on a different machine to see if maybe it was my setup, and still get the same error. Curious how you didn't run across this issue as well.

oylz commented 1 year ago

@tk4218

culture4515 commented 7 months ago

@oylz How did you solve the issue for converting the model to ONNX. I am still having following error:

RuntimeError: minus_one_pos != -1 INTERNAL ASSERT FAILED at "../torch/csrc/jit/passes/onnx/shape_type_inference.cpp":534, please report a bug to PyTorch. There are no examples for shape_has_zero = true && minus_one_pos == -1.

Can you push the working version with pull request?

pranayzomato commented 2 months ago

@oylz How did you solve the issue for converting the model to ONNX. I am still having following error:

RuntimeError: minus_one_pos != -1 INTERNAL ASSERT FAILED at "../torch/csrc/jit/passes/onnx/shape_type_inference.cpp":534, please report a bug to PyTorch. There are no examples for shape_has_zero = true && minus_one_pos == -1.

Can you push the working version with pull request?

Try changing the dummy input size from (1, 3, 384, 384) to anything else like (1, 3, 512, 512) and it should work