microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.15k stars 2.86k forks source link

Not able to load onnx model multilingual-e5-large #21321

Open JirHr opened 2 months ago

JirHr commented 2 months ago

Describe the issue

I am trying to use Spring AI for onnx model multilingual-e5-large:

Model is here: https://huggingface.co/intfloat/multilingual-e5-large/tree/main/onnx

When I have dependencies in build.gradle like (not trying to force new onnxruntime version): implementation 'org.springframework.ai:spring-ai-transformers' //implementation group: 'com.microsoft.onnxruntime', name: 'onnxruntime', version: '1.18.0'

I am getting: ORT_RUNTIME_EXCEPTION - message: Exception during initialization: C:\a_work\1\s\onnxruntime\core\optimizer\initializer.cc:35 onnxruntime::Initializer::Initializer !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.

When I have dependencies in build.gradle like (trying to force new onnxruntime version): implementation 'org.springframework.ai:spring-ai-transformers' implementation group: 'com.microsoft.onnxruntime', name: 'onnxruntime', version: '1.18.0'

I am getting: Error code - ORT_FAIL - message: Deserialize tensor onnx::MatMul_3326 failed.GetFileLength for .\model.onnx_data failed:open file model.onnx_data fail, errcode = 2 - unknown error

The code is perfectly working with model all-MiniLM-L6-v2: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main/onnx

Unfortunately at this moment I do not have any idea, how to resolve this issue. Can you help me?

To reproduce

My configuration is:

@Configuration
public class TransformerConf {
    @Bean("transformersEmbeddingModel")
    public EmbeddingModel embeddingClient() throws Exception {
        TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();
        embeddingModel.setTokenizerResource("classpath:/onnx/multilingual-e5-large/tokenizer.json");
        embeddingModel.setModelResource("classpath:/onnx/multilingual-e5-large/model.onnx");
        embeddingModel.afterPropertiesSet();
        return embeddingModel;
    }
}

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Java

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

yufenglee commented 2 months ago

The error shows that the model.onnx_data can not be found. Did you download both the model.onnx and model.onnx_data and save them in the same folder?

JirHr commented 2 months ago

I apologize - you are right. There are model.onnx and model.onnx_data files and I copied both of them, expecting that model.onnx is the "main" file and the load will be redirected to model.onnx_data

Once specifying: TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel(); embeddingModel.setModelResource("classpath:/onnx/multilingual-e5-large/model.onnx_data"); I was able to resolve the issue

Unfortunately spring AI with ai.onnxruntime still does not work. Now I am getting error: Caused by: java.lang.OutOfMemoryError: Required array size too large at java.base/java.io.InputStream.readNBytes(InputStream.java:420) ~[na:na] at java.base/java.io.InputStream.readAllBytes(InputStream.java:349) ~[na:na] at org.springframework.util.FileCopyUtils.copyToByteArray(FileCopyUtils.java:149) ~[spring-core-6.1.10.jar:6.1.10] at org.springframework.core.io.Resource.getContentAsByteArray(Resource.java:151) ~[spring-core-6.1.10.jar:6.1.10] at org.springframework.ai.transformers.TransformersEmbeddingModel.afterPropertiesSet(TransformersEmbeddingModel.java:193) ~[spring-ai-transformers-1.0.0-M1.jar:1.0.0-M1]

The model has 2,1Gb, I am using 64bit JVM 17 and setting JVM -Xms16g -Xmx16g did not help..... Thank you very much for ressolution of initial issue.

Craigacp commented 2 months ago

You can't load a multi-part model (where there is both model.onnx and model.onnx_data) from the classpath as a byte array, you need to extract it to a temporary location and load it using a file path. This is a limitation of both ORT and Java, ORT won't let you pass in the other model parts as byte arrays, and Java won't let you make a byte array that is bigger than 2^31. Open an issue on Spring AI as they'll need to add an alternative load mechanism.

JirHr commented 2 months ago

I am struggling with this issue almost two weeks - unfortunately I amjust onnx beginner....

I java I need to create OrtSession and there are only two constructor options:

A) From the single file

  /**
   * Create a session loading the model from disk.
   *
   * @param env The environment.
   * @param modelPath The path to the model.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the file could not be read, or the model was corrupted etc.
   */
  OrtSession(OrtEnvironment env, String modelPath, OrtAllocator allocator, SessionOptions options)
      throws OrtException {
    this(
        createSession(
            OnnxRuntime.ortApiHandle, env.getNativeHandle(), modelPath, options.getNativeHandle()),
        allocator);
  }

B) From protobuf byte array

  /**
   * Creates a session reading the model from the supplied byte array.
   *
   * @param env The environment.
   * @param modelArray The model protobuf as a byte array.
   * @param allocator The allocator to use.
   * @param options Session configuration options.
   * @throws OrtException If the model was corrupted or some other error occurred in native code.
   */
  OrtSession(OrtEnvironment env, byte[] modelArray, OrtAllocator allocator, SessionOptions options)

anyway, in the end you always need to have "one consolidated onnx export" e.g. in some way to "merge" the files

I have spend a lot of time to understand the issue, finding that probably the principle is: The current structure of initializer in model.onnx is always pointing to model_data (example):

  initializer {
    dims: 1024
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    external_data {
      key: "location"
      value: "model.onnx_data"
    }
    external_data {
      key: "offset"
      value: "1026146304"
    }
    external_data {
      key: "length"
      value: "4096"
    }
    data_location: EXTERNAL
  }

Expected resulting structure is without extarnal data, containing just raw_data (example):

  initializer {
    dims: 384
    data_type: 1
    name: "encoder.layer.0.attention.output.LayerNorm.bias"
    raw_data: "n\204v......."

I tried to:

final python code is:

def combine_onnx_files2(model_path, data_path, output_path):
    # Load the ONNX model structure
    model = onnx.load(model_path)

    # Load onnx_data
    with open(data_path, 'rb') as f:
        tensor_data = f.read()

    for initializer in model.graph.initializer:
        if initializer.data_location == onnx.TensorProto.EXTERNAL:
            offset = 0
            length = 0
            for data in initializer.external_data:
                if data.key == "offset":
                    offset = int(data.value)
                elif data.key == "length":
                    length = int(data.value)

            raw_data = tensor_data[offset:offset + length]

            del initializer.external_data[:]
            initializer.ClearField("data_location")

            initializer.raw_data = raw_data

    onnx.save(model, output_path)

Unfortunately: protoc --decode=onnx.ModelProto onnx.proto < model_comb.onnx > output_comb.txt is saying, that the structure is noc correct... what I am doing wrong?

Craigacp commented 2 months ago

You can't combine them into a single file, it won't load as it will be over the file size limit. You can load the onnx file and let it read the onnx_data file from disk in the location it is in, or load in the onnx_data file in python and write the initializers out in something you can easily read in Java then add them to the SessionOptions using addExternalInitializers. I haven't added support for reading onnx_data files directly in Java, though we could look at doing this.

JirHr commented 1 month ago

Thank you for reccomendation of addExternalInitializers

I have tried following code:

import ai.onnx.proto.OnnxMl;
import ai.onnxruntime.*;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Logger;

public class Main {
    public static void main(String[] args) {
        String modelPath = "/onnx/multilingual-e5-large/model.onnx";
        String dataPath = "/onnx/multilingual-e5-large/model.onnx_data";

        OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);
        System.out.println("Type size: "+ String.valueOf(typeTest.size));
        System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size)));

        try {
            // Create the ONNX Runtime environment
            OrtEnvironment env = OrtEnvironment.getEnvironment();
            // Create SessionOptions
            OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
            sessionOptions.setOptimizationLevel(OrtSession.SessionOptions.OptLevel.BASIC_OPT);
            // Load the ONNX model
            OnnxMl.ModelProto model = OnnxMl.ModelProto.parseFrom(new FileInputStream(modelPath));
            OnnxMl.GraphProto graph = model.getGraph();
            // Read _data file
            Map<String, OnnxTensorLike> initializers = new HashMap<>();
            for (OnnxMl.TensorProto initializer : graph.getInitializerList()) {
                if (initializer.getDataLocation() == OnnxMl.TensorProto.DataLocation.EXTERNAL) {
                    long offset = 0;
                    int length = 0;
                    String name = initializer.getName();
                    for (OnnxMl.StringStringEntryProto data : initializer.getExternalDataList()) {
                        if (data.getKey().equals("offset")) {
                            offset = Long.parseLong(data.getValue());
                        } else if (data.getKey().equals("length")) {
                            length = Integer.parseInt(data.getValue());
                        }
                    }
                    byte[] rawData = readExternalData(dataPath, offset, length);
                    ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
                    System.out.println("Raw data offset: " + String.valueOf(offset));
                    System.out.println("Raw data length: " + String.valueOf(length));
                    System.out.println("Raw data size: " + String.valueOf(rawData.length));
                    System.out.println("Buffer limit: " + String.valueOf(byteBuffer.limit()));

                    // Create OnnxTensor and use it as OnnxTensorLike
                    /*
                    OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(
                            env,
                            byteBuffer,
                            convertListToLongArray(initializer.getDimsList()),
                            OnnxJavaType.mapFromInt(initializer.getDataType())
                    );

                    initializers.put(name, onnxTensorLike);
                     */
                }
            }

            // Add external initializers to SessionOptions
            sessionOptions.addExternalInitializers(initializers);

            // Load the model and create a session
            OrtSession session = env.createSession(modelPath, sessionOptions);

            // Close resources
            session.close();
            sessionOptions.close();
            env.close();
        } catch (IOException | OrtException e) {
            e.printStackTrace();
        }
    }

    private static byte[] readExternalData(String dataPath, long offset, int length) throws IOException {
        try (FileInputStream fis = new FileInputStream(dataPath);
             FileChannel fileChannel = fis.getChannel()) {
            ByteBuffer buffer = ByteBuffer.allocate(length);
            fileChannel.position(offset);
            fileChannel.read(buffer);
            return buffer.array();
        }
    }

    public static long[] convertListToLongArray(List<Long> longList) {
        long[] longArray = new long[longList.size()];
        for (int i = 0; i < longList.size(); i++) {
            longArray[i] = longList.get(i);
        }
        return longArray;
    }
}

When you uncomment "Create OnnxTensor and use it as OnnxTensorLike" part, I am getting an error:

Cannot allocate a direct buffer of the requested size and type, size 1024008192, type = FLOAT

I am using Java 17 and added these VM options: -XX:MaxDirectMemorySize=4g -Xmx2g - did not help

Going to OrtUtil.java, line 492 I see following condition:

  static BufferTuple prepareBuffer(Buffer data, OnnxJavaType type) {
    if (type == OnnxJavaType.STRING || type == OnnxJavaType.UNKNOWN) {
      throw new IllegalStateException("Cannot create a " + type + " tensor from a buffer");
    }
    int bufferPos;
    long bufferSizeLong = data.remaining() * (long) type.size;
    if (bufferSizeLong > (Integer.MAX_VALUE - (8 * type.size))) {

Therefore I have commented "Create OnnxTensor and use it as OnnxTensorLike" part, getting for first largest initializer Type size: 4 //e.g. OnnxJavaType typeTest = OnnxJavaType.mapFromClass(Float.class);System.out.println("Type size: "+ String.valueOf(typeTest.size)); Max value: 2147483615 //e.g. System.out.println("Max value: "+ String.valueOf(Integer.MAX_VALUE - (8 * typeTest.size))); Raw data offset: 0 //e.g. initializer offset Raw data length: 1024008192 //e.g. initializer length Raw data size: 1024008192 //e.g. check of raw data size Buffer limit: 1024008192 //e.g. byteBuffer.limit()

Raw data offset: 1024008192 Raw data length: 2105344 Raw data size: 2105344 Buffer limit: 2105344 .....

It seems, that even here is 2Gb limit (2147483615), and I do not understand why it fails if initializer has 1Gb...what I am doing wrong?

JirHr commented 1 month ago

I was recommended this workaround, which seems to be working (further tests required):

byte[] rawData = readExternalData(dataPath, offset, length);
ByteBuffer byteBuffer = ByteBuffer.wrap(rawData);
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
FloatBuffer floatBuffer = byteBuffer.asFloatBuffer();
OnnxTensorLike onnxTensorLike = OnnxTensor.createTensor(env,floatBuffer,convertListToLongArray(initializer.getDimsList()));
leichangqing commented 1 month ago

I had this issue too. when can fix it?

Craigacp commented 1 month ago

You can load in the initializers manually by processing the byte stream from the onnx_data file if you have it in a classpath resource, or load it off disk. This is really a Spring AI issue as they assume that everything can be loaded from classpath resources, but that's not true for ONNX models which are larger than 2GB.