Not able to load onnx model multilingual-e5-large #21321

Open JirHr opened 2 months ago

JirHr commented 2 months ago

Describe the issue

I am trying to use Spring AI for onnx model multilingual-e5-large:

Model is here:

When I have dependencies in build.gradle like (not trying to force new onnxruntime version): implementation '' //implementation group: '', name: 'onnxruntime', version: '1.18.0'

I am getting: ORT_RUNTIME_EXCEPTION - message: Exception during initialization: C:\a_work\1\s\onnxruntime\core\optimizer\ onnxruntime::Initializer::Initializer !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.

When I have dependencies in build.gradle like (trying to force new onnxruntime version): implementation '' implementation group: '', name: 'onnxruntime', version: '1.18.0'

I am getting: Error code - ORT_FAIL - message: Deserialize tensor onnx::MatMul_3326 failed.GetFileLength for .\model.onnx_data failed:open file model.onnx_data fail, errcode = 2 - unknown error

The code is perfectly working with model all-MiniLM-L6-v2:

Unfortunately at this moment I do not have any idea, how to resolve this issue. Can you help me?

To reproduce

My configuration is:

public class TransformerConf {
    public EmbeddingModel embeddingClient() throws Exception {
        TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();
        return embeddingModel;


No response



OS Version


ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID


ONNX Runtime API




Execution Provider

Default CPU

Execution Provider Library Version

No response

yufenglee commented 2 months ago

The error shows that the model.onnx_data can not be found. Did you download both the model.onnx and model.onnx_data and save them in the same folder?

JirHr commented 2 months ago

I apologize - you are right. There are model.onnx and model.onnx_data files and I copied both of them, expecting that model.onnx is the "main" file and the load will be redirected to model.onnx_data

Once specifying: TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel(); embeddingModel.setModelResource("classpath:/onnx/multilingual-e5-large/model.onnx_data"); I was able to resolve the issue

Unfortunately spring AI with ai.onnxruntime still does not work. Now I am getting error: Caused by: java.lang.OutOfMemoryError: Required array size too large at java.base/ ~[na:na] at java.base/ ~[na:na] at org.springframework.util.FileCopyUtils.copyToByteArray( ~[spring-core-6.1.10.jar:6.1.10] at ~[spring-core-6.1.10.jar:6.1.10] at ~[spring-ai-transformers-1.0.0-M1.jar:1.0.0-M1]

The model has 2,1Gb, I am using 64bit JVM 17 and setting JVM -Xms16g -Xmx16g did not help..... Thank you very much for ressolution of initial issue.

Craigacp commented 2 months ago

You can't load a multi-part model (where there is both model.onnx and model.onnx_data) from the classpath as a byte array, you need to extract it to a temporary location and load it using a file path. This is a limitation of both ORT and Java, ORT won't let you pass in the other model parts as byte arrays, and Java won't let you make a byte array that is bigger than 2^31. Open an issue on Spring AI as they'll need to add an alternative load mechanism.

JirHr commented 2 months ago

I am struggling with this issue almost two weeks - unfortunately I amjust onnx beginner....

I java I need to create OrtSession and there are only two constructor options:

A) From the single file

   * Create a session loading the model from disk.
   * @param env The environment.
   * @param modelPath The path to the model.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the file could not be read, or the model was corrupted etc.
  OrtSession(OrtEnvironment env, String modelPath, OrtAllocator allocator, SessionOptions options)
      throws OrtException {
            OnnxRuntime.ortApiHandle, env.getNativeHandle(), modelPath, options.getNativeHandle()),

B) From protobuf byte array

   * Creates a session reading the model from the supplied byte array.
   * @param env The environment.
   * @param modelArray The model protobuf as a byte array.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the model was corrupted or some other error occurred in native code.
  OrtSession(OrtEnvironment env, byte[] modelArray, OrtAllocator allocator, SessionOptions options)

anyway, in the end you always need to have "one consolidated onnx export" e.g. in some way to "merge" the files

I have spend a lot of time to understand the issue, finding that probably the principle is: The current structure of initializer in model.onnx is always pointing to model_data (example):

  initializer {
    dims: 1024
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    external_data {
      key: "location"
      value: "model.onnx_data"
    external_data {
      key: "offset"
      value: "1026146304"
    external_data {
      key: "length"
      value: "4096"
    data_location: EXTERNAL

Expected resulting structure is without extarnal data, containing just raw_data (example):

  initializer {
    dims: 384
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    raw_data: "n\204v......."

I tried to:

final python code is:

def combine_onnx_files2(model_path, data_path, output_path):
    # Load the ONNX model structure
    model = onnx.load(model_path)

    # Load onnx_data
    with open(data_path, 'rb') as f:
        tensor_data =

    for initializer in model.graph.initializer:
        if initializer.data_location == onnx.TensorProto.EXTERNAL:
            offset = 0
            length = 0
            for data in initializer.external_data:
                if data.key == "offset":
                    offset = int(data.value)
                elif data.key == "length":
                    length = int(data.value)

            raw_data = tensor_data[offset:offset + length]

            del initializer.external_data[:]

            initializer.raw_data = raw_data, output_path)

Unfortunately: protoc --decode=onnx.ModelProto onnx.proto < model_comb.onnx > output_comb.txt is saying, that the structure is noc correct... what I am doing wrong?

Craigacp commented 2 months ago

You can't combine them into a single file, it won't load as it will be over the file size limit. You can load the onnx file and let it read the onnx_data file from disk in the location it is in, or load in the onnx_data file in python and write the initializers out in something you can easily read in Java then add them to the SessionOptions using addExternalInitializers. I haven't added support for reading onnx_data files directly in Java, though we could look at doing this.

JirHr commented 1 month ago

Thank you for reccomendation of addExternalInitializers

I have tried following code:

import ai.onnx.proto.OnnxMl;
import ai.onnxruntime.*;

import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Logger;

public class Main {
    public static void main(String[] args) {
        String modelPath = "/onnx/multilingual-e5-large/model.onnx";
        String dataPath = "/onnx/multilingual-e5-large/model.onnx_data";

        OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);
        System.out.println("Type size: "+ String.valueOf(typeTest.size));
        System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size)));

        try {
            // Create the ONNX Runtime environment
            OrtEnvironment env = OrtEnvironment.getEnvironment();
            // Create SessionOptions
            OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
            // Load the ONNX model
            OnnxMl.ModelProto model = OnnxMl.ModelProto.parseFrom(new FileInputStream(modelPath));
            OnnxMl.GraphProto graph = model.getGraph();
            // Read _data file
            Map<String, OnnxTensorLike> initializers = new HashMap<>();
            for (OnnxMl.TensorProto initializer : graph.getInitializerList()) {
                if (initializer.getDataLocation() == OnnxMl.TensorProto.DataLocation.EXTERNAL) {
                    long offset = 0;
                    int length = 0;
                    String name = initializer.getName();
                    for (OnnxMl.StringStringEntryProto data : initializer.getExternalDataList()) {
                        if (data.getKey().equals("offset")) {
                            offset = Long.parseLong(data.getValue());
                        } else if (data.getKey().equals("length")) {
                            length = Integer.parseInt(data.getValue());
                    byte[] rawData = readExternalData(dataPath, offset, length);
                    ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
                    System.out.println("Raw data offset: " + String.valueOf(offset));
                    System.out.println("Raw data length: " + String.valueOf(length));
                    System.out.println("Raw data size: " + String.valueOf(rawData.length));
                    System.out.println("Buffer limit: " + String.valueOf(byteBuffer.limit()));

                    // Create OnnxTensor and use it as OnnxTensorLike
                    OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(

                    initializers.put(name, onnxTensorLike);

            // Add external initializers to SessionOptions

            // Load the model and create a session
            OrtSession session = env.createSession(modelPath, sessionOptions);

            // Close resources
        } catch (IOException | OrtException e) {

    private static byte[] readExternalData(String dataPath, long offset, int length) throws IOException {
        try (FileInputStream fis = new FileInputStream(dataPath);
             FileChannel fileChannel = fis.getChannel()) {
            ByteBuffer buffer = ByteBuffer.allocate(length);
            return buffer.array();

    public static long[] convertListToLongArray(List<Long> longList) {
        long[] longArray = new long[longList.size()];
        for (int i = 0; i < longList.size(); i++) {
            longArray[i] = longList.get(i);
        return longArray;

When you uncomment "Create OnnxTensor and use it as OnnxTensorLike" part, I am getting an error:

Cannot allocate a direct buffer of the requested size and type, size 1024008192, type = FLOAT

I am using Java 17 and added these VM options: -XX:MaxDirectMemorySize=4g -Xmx2g - did not help

Going to, line 492 I see following condition:

  static BufferTuple prepareBuffer(Buffer data, OnnxJavaType type) {
    if (type == OnnxJavaType.STRING || type == OnnxJavaType.UNKNOWN) {
      throw new IllegalStateException("Cannot create a " + type + " tensor from a buffer");
    int bufferPos;
    long bufferSizeLong = data.remaining() * (long) type.size;
    if (bufferSizeLong > (Integer.MAX_VALUE - (8 * type.size))) {

Therefore I have commented "Create OnnxTensor and use it as OnnxTensorLike" part, getting for first largest initializer Type size: 4 //e.g. OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);System.out.println("Type size: "+ String.valueOf(typeTest.size)); Max value: 2147483615 //e.g. System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size))); Raw data offset: 0 //e.g. initializer offset Raw data length: 1024008192 //e.g. initializer length Raw data size: 1024008192 //e.g. check of raw data size Buffer limit: 1024008192 //e.g. byteBuffer.limit()

Raw data offset: 1024008192 Raw data length: 2105344 Raw data size: 2105344 Buffer limit: 2105344 .....

It seems, that even here is 2Gb limit (2147483615), and I do not understand why it fails if initializer has 1Gb...what I am doing wrong?

JirHr commented 1 month ago

I was recommended this workaround, which seems to be working (further tests required):

byte[] rawData = readExternalData(dataPath, offset, length);
ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
FloatBuffer floatBuffer = byteBuffer.asFloatBuffer();
OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(env,floatBuffer,convertListToLongArray(initializer.getDimsList()));
leichangqing commented 1 month ago

I had this issue too. when can fix it?

Craigacp commented 1 month ago

You can load in the initializers manually by processing the byte stream from the onnx_data file if you have it in a classpath resource, or load it off disk. This is really a Spring AI issue as they assume that everything can be loaded from classpath resources, but that's not true for ONNX models which are larger than 2GB.