Issue while loading big model like mt5 (may be due to limited size)

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.14k stars 2.85k forks source link

Issue while loading big model like mt5 (may be due to limited size) #11848

Closed OriAlpha closed 2 years ago

OriAlpha commented 2 years ago

Describe the bug I have mt5 model, which is converted from pytorch to onnx and now i am trying to load a model into onnxruntime, i found out following error. Maybe due to limit specified in code somewhere. I was not able to find, any ideas to change could help me??

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_4622 failed.tensorprotoutils.cc:637 TensorProtoToTensor External initializer: onnx::MatMul_4622 offset: 0 size to read: 11534336 given file_length: 4194304 are out of bounds or can not be read in full.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
ONNX Runtime installed from (source or binary): binary
ONNX Runtime version: onnx==1.11.0
Python version: 3.8.13
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

tianleiwu commented 2 years ago

I guess it is because external data. Previously, the encoder onnx and decoder onnx might be in different folder. After merging them into one ONNX model, now ORT start looking for the external data file in the same folder. If encoder and decoder have same external data filename. There will be confliction.

The solution is to make sure they use different files to store external data like:

mt5_encoder.onnx
mt5_encoder.onnx.data (This is the external data file for encoder)
mt5_decoder.onnx
mt5_decoder.onnx.data (This is the external data file for decoder)

To achieve that, we can use onnx API to save encoder/decoder ONNX model like

model=onnx.load_model("mt5_encoder.onnx", load_externa_data=True)
onnx.save_model("mt5_encoder.onnx", save_as_external_data=True,  all_tensors_to_one_file=True, location="mt5_encoder.onnx.data", size_threshold=1024, convert_attribute=False)

After that, you can run convert_beam_search.py if needed.

OriAlpha commented 2 years ago

I converted model as you specified, but still its gives out issue. onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_4720 failed.tensorprotoutils.cc:145 GetExternalDataInfo TensorProto: onnx::MatMul_4720 external data size mismatch. Computed size: 4194304, external_data.length: 11534336 Additional info decoder_init.onnx.data is around 3.5 GB.

OriAlpha commented 2 years ago

If you want to try model please use https://huggingface.co/google/mt5-large/tree/main

OriAlpha commented 2 years ago

I even tried testing in different systems, still it give same issue as onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_4498 failed.tensorprotoutils.cc:145 GetExternalDataInfo TensorProto external data size mismatch. Computed size: 11534336, external_data.length: 4194304 This happens only for decoder module, encoder and decoder_init session creates without any issues. @tianleiwu do you suggest any way to fix these issues

tianleiwu commented 2 years ago

@OriAlpha, you can try get the source of my pull request 11958 and run like the following for onnx conversion (need install nightly package or build wheel from source for model inference):

python convert_beam_search.py -m google/mt5-large --model_type mt5 --output mt5-large-beamsearch.onnx -e

I can run mt5_large model. There is still some parity issue (max diff are close to e-2) need PyTorch exporter team to take a look.

OriAlpha commented 2 years ago

Thanks for looking, its working.