second-state / WasmEdge-WASINN-examples

Apache License 2.0
237 stars 39 forks source link

Trying to use ResNet-50 model for PyTorch results in 'torch::jit::ErrorReport' issue #160

Open OrionKai opened 1 month ago

OrionKai commented 1 month ago

This error occurred while using the following versions:

I tried to modify the pytorch-mobilenet-image example by making a copy of gen_mobilenet_model.py called gen_resnet50_model.py, wherein I substituted references to the MobileNet model with references to the ResNet50 model by Nvidia, available on the PyTorch Hub at the following link: https://pytorch.org/hub/nvidia_deeplearningexamples_resnet50/. The code for this script is as follows:

import os
import torch
from torch import jit

with torch.no_grad():
    fake_input = torch.rand(1, 3, 224, 224)
    model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True)
    model.eval()
    out1 = model(fake_input).squeeze()

    sm = torch.jit.script(model)
    if not os.path.exists("resnet50.pt"):
        sm.save("resnet50.pt")
    load_sm = jit.load("resnet50.pt")
    out2 = load_sm(fake_input).squeeze()

    print(out1[:5], out2[:5])

I ran this script, which successfully generated the file "resnet50.pt". I then tried to run the following command:

wasmedge --dir .:. wasmedge-wasinn-example-mobilenet-image.wasm resnet50.pt input.jpg

However, doing so produces the following error message:

terminate called after throwing an instance of 'torch::jit::ErrorReport'
terminate called recursively
Aborted

Given that none of the print statements in the main.rs Rust source code are being executed, the error appears to be happening when build_from_files() is being called in line 18 of the source code.

I have also been trying other models obtainable from PyTorch Hub, all of which lead to the same issue. The models I have tried so far are:

hydai commented 1 month ago

Looks similar with the closed issue: https://github.com/second-state/WasmEdge-WASINN-examples/issues/128 Please check if you have the correct name for all your assets.

OrionKai commented 1 month ago

I'm using the compiled wasmedge-wasinn-example-mobilenet-image.wasm file provided in the example, since I assumed, based on the Rust source code, that it would also work for a ResNet model exported to the TorchScript format without requiring further modification. Might I have been mistaken in assuming this?

hydai commented 1 month ago

I'm not sure about that since ResNet is not a model I am familiar with. But you can check the implementation inside the example to see if anything needs to be changed.

OrionKai commented 1 month ago

I've managed to fix the issue! First, I changed the version of libtorch the WASI-NN plugin depended on to 2.4.1, since Python on my machine had PyTorch version 2.4.1. This allowed me to use the SqueezeNet model found here: https://pytorch.org/hub/pytorch_vision_squeezenet/

However, after doing this, the same error still appeared for the other models. I was able to fix it for the other models by changing the model generation script to use torch.jit.trace instead of torch.jit.script. This was because, based on my understanding, the models did not have dynamic control flow and hence did not require scripting for conversion into TorchScript format.