inference.core.exceptions.ModelArtefactError: Unable to load ONNX session. Cause: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/cache\chinese-calligraphy-recognition-sl0eb/2\best.onnx failed:Protobuf parsing failed.

Sui-25 commented 3 months ago

Search before asking

[X] I have searched the Inference issues and found no similar feature requests.

Question

I apologize for any confusion my explanation may cause. I’m a beginner and I need to use roboflow inference to complete my project. There’s a problem that needs to be solved.

Here’s the situation: When using the get_model() function to get the model, an error occurred, as follows:

When getting the model (model_id: kitchenfire/1), everything went smoothly;
When getting the model (model_id: chinese-calligraphy-styles/1) and the model (model_id: chinese-calligraphy-recognition-sl0eb/2), the following error message was displayed in the python terminal:

C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:65: UserWarning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.Available providers: 'TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider'
  warnings.warn(
Traceback (most recent call last):
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\roboflow.py", line 713, in initialize_model
    self.onnx_session = onnxruntime.InferenceSession(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 383, in __init__   
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 424, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/cache\chinese-calligraphy-recognition-sl0eb/2\best.onnx failed:Protobuf parsing failed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "d:\OTHERs\Windows Shortcut Folder\Documents\Code Projects\VS Code Projects\CCRS\test.py", line 4, in <module>
    model = inference.get_model("chinese-calligraphy-recognition-sl0eb/2", api_key=__ApiKey)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\models\utils.py", line 195, in get_model
    return ROBOFLOW_MODEL_TYPES[(task, model)](model_id, api_key=api_key, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\models\vit\vit_classification.py", line 26, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\classification_base.py", line 40, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\roboflow.py", line 612, in __init__
    self.initialize_model()
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\roboflow.py", line 720, in initialize_model
    raise ModelArtefactError(
inference.core.exceptions.ModelArtefactError: Unable to load ONNX session. Cause: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/cache\chinese-calligraphy-recognition-sl0eb/2\best.onnx failed:Protobuf parsing failed.

Additional

No response

Sui-25 commented 3 months ago

Have no idea it suddenly ran successfully.

My personal guesses for the reasons are:

The model is too large;
It's due to network issues;
I ran the command rm -r /tmp/cache in the terminal.

I personally found another solution, which is suitable for situations where only the inference results of the model are needed (Please forgive me, I’m a beginner.):

# import the inference-sdk
from inference_sdk import InferenceHTTPClient

# initialize the client
CLIENT = InferenceHTTPClient(
    api_url="https://detect.roboflow.com",
    api_key="your api_key"
)

# infer on a local image
result = CLIENT.infer("your image.jpg", model_id="your model_id")

Hope this can help other beginners.

Sorry again for occupying the public resources.

PawelPeczek-Roboflow commented 3 months ago

hi there. That error is truly strange. To debug that I would need to ask where you have your server running - is that cpu machine or GPU machine or Jetson?

Sui-25 commented 3 months ago

Thank you very much for your reply!

I am running on my laptop, and the specifications of my laptop are as follows:

Windows 11 Professional 23H2
CPU： Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz
GPU： GTX 1660Ti M
RAM： 16.0 GB
System Type： 64-bit operating system, x64-based cpu

My Python environment has installed inference and inference-gpu, but I don't know if inference-gpu is working. When the error occurred, I simply used the following Python statement in VS Code to deploy the model locally:

from inference.models.utils import get_model
model = get_model(model_id=__ModelId, api_key=__ApiKey)

If it helps, I can describe the situation at the time (because I personally guess it has something to do with the network):

When I ran the Python program, the program did not respond at all. Since I observed in the task manager that the Python program was using the network, I think it was stuck at the get_model() function. At that time, I tried the following different ways to run the Python program to deploy the model (model_id: chinese-calligraphy-styles/1) and model (model_id: chinese-calligraphy-recognition-sl0eb/2):

When the VPN was not turned on, after I ran the Python program, the task manager showed that the network speed of the Python program was about 0~500KB/s. After waiting for many minutes, the Python program still did not proceed to the next step, the Python terminal did not output anything, and the error message I mentioned above did not appear, so I closed the run.
When the VPN was turned on, the task manager showed that the network speed of the Python program was a bit faster, about 0~3MB/s, and the error message I mentioned above appeared in the Python terminal in less than a minute.

_It should be mentioned that when I deployed the model (modelid: kitchenfire/1), I did not use a VPN, and every time I ran Python, the deployment was successfully completed.

After many attempts as described above, it suddenly ran successfully.

PawelPeczek-Roboflow commented 2 months ago

Ok, so let me add few comments:

having GPU you should install only inference-gpu - but to be honest Windows may be problematic in some cases 🙈
we pull model weights at first - that's why your program may seem to freeze (good idea would be to have a progress bar, btw) not sure why it takes so much time
Error may (or may not - hard to say) be caused by corrupted model weights file - which assuming strangely slow download appears to me as possible.

From this statement: After many attempts as described above, it suddenly ran successfully. I assume you finally got the model running, right?

Sui-25 commented 2 months ago

Thank you so much for your supplement!

Yes, I have now successfully and stably run it many times.

As for the reason for the exceptionally slow download speed, I personally guess it may also be related to the network environment in my area.

However, as a beginner in deep learning, the information I can provide is only so much, and I greatly appreciate your patient responses once again.

To avoid taking up your valuable time, I think maybe it's time to close this issue.

Thank you once again for all your responses!

grzegorz-roboflow commented 2 months ago

Hi @Sui-25 , following your suggestion I will close this issue but if you do have further problems please do not hesitate to open another issue. Thank you!

roboflow / inference