renesas-rz / rzv_drp-ai_tvm

Extension package of Apache TVM (Machine Learning Compiler) for Renesas DRP-AI accelerators powered by Edgecortix MERA(TM) Based Apache TVM version: v0.11.1
Apache License 2.0
41 stars 5 forks source link

compile_pytorch_model.py compile failures (model_path/constants.pkl not found) #22

Open ljkeller opened 3 months ago

ljkeller commented 3 months ago

Hello,

I'm having compile failures with compile_pytorch_model.py. Heres my failure:

/drp-ai_tvm/tutorials# python3 compile_pytorch_model.py /home/models/spark_torch.pt -o spark_torch -s 1,3,28,28
[Check arguments]
  Input AI model         :  /home/models/spark_torch.pt
  SDK path               :  /opt/poky/3.1.21
  DRP-AI Translator path :  /opt/drp-ai_translator_release
  Output dir             :  spark_torch
  Input shape            :  (1, 3, 28, 28)
Traceback (most recent call last):
  File "compile_pytorch_model.py", line 69, in <module>
    model = torch.jit.load(model_file)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_serialization.py", line 161, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1.0.1_9_epochs_no_norm_97.27/constants.pkl

Interestingly, I trained this model and deployed to both torch and onnx formats. The onnx export works python3 compile_onnx_model.py /home/models/spark.onnx -o spark -s 1,3,28,28 -i input.

I'm guessing there is a version incompatibility with the torch I trained/exported on and the torch used here for the conversion? I don't see any documentation about expected torch training versions. I don't have my model training PC with me right now, or I'd report the torch version.

Here are the models I've tried spark.zip

Environment

I'm running out of a docker container I built ~6 months ago with docker build -t rzv2l_ai_sdk_image --build-arg SDK="/opt/poky/3.1.21" --build-arg PRODUCT="V2L" as far as I know.