zjc664656505 commented 1 year ago

Dear ONNX community,

Recently, I'm trying to build my app for on-device training. I followed the procedure in the example but made modifications in the CmakeList.txt file.

Here is my CmakeList file code:

# Sets the minimum version of CMake required to build the native library.

cmake_minimum_required(VERSION 3.18.1)
project("distributed_inference_demo")

add_library( # Sets the name of the library.
        distributed_inference_demo

        # Sets the library as a shared library.
        SHARED

        # Provides a relative path to your source file(s).
        native-lib.cpp
        utils.cpp
        inference.cpp
        )

add_library(onnxruntime SHARED IMPORTED)
set_target_properties(onnxruntime PROPERTIES IMPORTED_LOCATION ${CMAKE_SOURCE_DIR}/lib/libonnxruntime.so)
add_library(onnxruntime4j_jni SHARED IMPORTED)
set_target_properties(onnxruntime4j_jni PROPERTIES IMPORTED_LOCATION ${CMAKE_SOURCE_DIR}/lib/libonnxruntime4j_jni.so)
target_include_directories(distributed_inference_demo PRIVATE ${CMAKE_SOURCE_DIR}/include/)

# Searches for a specified prebuilt library and stores the path as a
# variable. Because CMake includes system libraries in the search path by
# default, you only need to specify the name of the public NDK library
# you want to add. CMake verifies that the library exists before
# completing its build.

find_library( # Sets the name of the path variable.
        log-lib

        # Specifies the name of the NDK library that
        # you want CMake to locate.
        log)

# Specifies libraries CMake should link to your target library. You
# can link multiple libraries, such as libraries you define in this
# build script, prebuilt third-party libraries, or system libraries.

target_link_libraries( # Specifies the target library.
        distributed_inference_demo

        # Links the target library to the log library
        # included in the NDK.
        ${log-lib}
        onnxruntime4j_jni
        onnxruntime)

As you can see, I add the libonnxruntime4j_jni.so as additional library in my app. The libonnxruntime4j_jni.so file is saved under my cpp/lib/ directory.

However, when I build my app, the error constantly prompts up shown below:

If it's possible, could anyone help me with solving this issue?

baijumeswani commented 1 year ago

What is the reason behind using the `libonnxruntime4_jni.so

If you're intending to use java bindings in your application, you could directly leverage the ort Java bindings using the aar file here: https://central.sonatype.com/artifact/com.microsoft.onnxruntime/onnxruntime-training-android/1.15.1

I'll try to see if I can reproduce the error on my end.

baijumeswani commented 1 year ago

I just tried it on my end and was able to successfully build the application. Not sure what might be going on with your env.

zjc664656505 commented 1 year ago

Hi Baiju,

I have solved this issue just this morning and forget to update this issue. My way of solving it is creating an folder under the directory src/main/ like what's shown below:

then load the native libonnxruntime4j_jni.so library in Mainactivity.kt like this:

    companion object {
        // Used to load the 'ortpersonalize' library on application startup.
        init {
            System.loadLibrary("onnxruntime4j_jni")
            System.loadLibrary("distributed_inference_demo")
        }
    }

The reason behind using it is we want to convert java OnnxTensor to C++ Ort::Value tensor.

Currently, we are deploying quantized llm to android device using onnx ~ around 1.2gb per model. However, if we directly create session using the onnxruntime in java environment, it will directly result in out of memory Error due to the jvm memory limitation issue. So, we used the android NDK with onnxruntime C++ API for loading the model and conducting inference to solve this out of memory Error.

New Issue - Tensor Conversion between Java and C++

We tried to conduct inference directly in the C++ backend by creating a random int64 datatype Ort::Value tensor and it was successfully done. However, the problems are:

The input value should be a OnnxTensor created in java end which needs the onnxruntime package in java and should be converted into C++ Ort::Value tensor for conducting inference in the C++ backend.
The result Ort::Value tensor from C++ backend is needed to be converted back to OnnxTensor in Java.

We are wondering whether or not there is an API that allows us to do this automatically, or we have to write our own API for doing this?

Please let me know.

Thanks a lot!

Best, Junchen

baijumeswani commented 1 year ago

Hi Junchen.

I am not aware of any API that does this directly. But you can refer to how we do it for our train step function in the java bindings:

Let me know if there are any further questions.

zjc664656505 commented 1 year ago

Thanks Baiju. We have fixed the issue. I will close this issue.

zjc664656505 commented 1 year ago

Hi Baiju,

I have a further question wishing to ask you if it's possible. I'm currently trying to serialize and deserialize the onnx model output in C++ which is a std::vector<Ort::Value> object. Currently, I have not found any related C++ API in onnxruntime. If it's possible, may I know whether there are approaches of doing so?

baijumeswani commented 1 year ago

Hi Junchen,

You could get the underlying float buffer from each of the Ort::Values inside the vector. Then you can chose to serialize it however you wish.

zjc664656505 commented 1 year ago

Hi Baiju,

Thanks a lot for your help!

I have successfully serialized the tensor vector using float buffer.

Here is my approach of serializing the std::vector<Ort::Value>:

std::vector<char> SerializeTensorVectorToBytes(const std::vector<Ort::Value>& tensors) {
        std::vector<char> bytes;

        size_t numTensors = tensors.size();
        const char* dataPtr = reinterpret_cast<const char*>(&numTensors);
        bytes.insert(bytes.end(), dataPtr, dataPtr + sizeof(size_t));

        for (const auto& tensor : tensors) {
            if (!tensor.IsTensor()) {
                std::cerr << "Skipping non-tensor Ort::Value." << std::endl;
                continue;
            }

            const float* floatArr = tensor.GetTensorData<float>();
            Ort::TensorTypeAndShapeInfo info = tensor.GetTensorTypeAndShapeInfo();
            size_t elementCount = info.GetElementCount();

            // Get the shape of the tensor
            std::vector<int64_t> shape = info.GetShape();
            size_t numDimensions = shape.size();

            const char* elementCountPtr = reinterpret_cast<const char*>(&elementCount);
            bytes.insert(bytes.end(), elementCountPtr, elementCountPtr + sizeof(size_t));

            const char* tensorDataPtr = reinterpret_cast<const char*>(floatArr);
            bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(float));

            // Write the number of dimensions to the bytes
            const char* numDimensionsPtr = reinterpret_cast<const char*>(&numDimensions);
            bytes.insert(bytes.end(), numDimensionsPtr, numDimensionsPtr + sizeof(size_t));

            // Write each dimension to the bytes
            for (int64_t dimension : shape) {
                const char* dimensionPtr = reinterpret_cast<const char*>(&dimension);
                bytes.insert(bytes.end(), dimensionPtr, dimensionPtr + sizeof(int64_t));
            }
        }

        return bytes;
    }

I encountered an issue during the deserialization process of the model's output vector, which contains multiple Ort::Values with different data types. To deserialize the data, I need to recreate the Ort::Values using Ort::Value::CreateTensor based on the tensorType information that was serialized. However, the tensorType values come from TensorProto, and I need to convert them to their corresponding C++ data types like int8, uint8, etc., for creating the Ort::Values.

At the moment, I'm not sure if there's an alternative way to handle this issue, or if I need to manually handle the data types during deserialization. I'm providing the code snippet of my current deserialization implementation where I'm facing this problem

  std::vector<Ort::Value> DeserializeTensorVectorFromBytes(const std::vector<char>& bytes) {
      std::vector<Ort::Value> tensors;

      const char* dataPtr = bytes.data();
      const char* endPtr = bytes.data() + bytes.size();

      if (endPtr - dataPtr < sizeof(size_t)) {
          std::cerr << "Not enough data to deserialize." << std::endl;
          return tensors;
      }

      size_t numTensors = *reinterpret_cast<const size_t*>(dataPtr);
      dataPtr += sizeof(size_t);

      for (size_t i = 0; i < numTensors; ++i) {
          if (endPtr - dataPtr < sizeof(size_t)) {
              std::cerr << "Not enough data to deserialize tensor." << std::endl;
              return tensors;
          }

          size_t elementCount = *reinterpret_cast<const size_t*>(dataPtr);
          dataPtr += sizeof(size_t);

          if (endPtr - dataPtr < elementCount * sizeof(float)) {
              std::cerr << "Not enough data to deserialize tensor data." << std::endl;
              return tensors;
          }

          const float* floatArr = reinterpret_cast<const float*>(dataPtr);
          dataPtr += elementCount * sizeof(float);

          if (endPtr - dataPtr < sizeof(size_t)) {
              std::cerr << "Not enough data to deserialize tensor shape." << std::endl;
              return tensors;
          }

          size_t numDimensions = *reinterpret_cast<const size_t*>(dataPtr);
          dataPtr += sizeof(size_t);

          if (endPtr - dataPtr < numDimensions * sizeof(int64_t)) {
              std::cerr << "Not enough data to deserialize tensor shape." << std::endl;
              return tensors;
          }

          std::vector<int64_t> shape(numDimensions);
          for (size_t j = 0; j < numDimensions; ++j) {
              shape[j] = *reinterpret_cast<const int64_t*>(dataPtr);
              dataPtr += sizeof(int64_t);
          }

          Ort::AllocatorWithDefaultOptions allocator;

          // ? indicates where I'm currently stocking with
          Ort::Value tensor = Ort::Value::CreateTensor< ? >(allocator, shape.data(), shape.size());
          float* tensorData = tensor.GetTensorMutableData< ? >();
          std::copy(floatArr, floatArr + elementCount, tensorData);

          tensors.push_back(std::move(tensor));
      }

      return tensors;
  }

Any suggestions or insights on how to efficiently handle the data types during deserialization would be greatly appreciated. Thank you!

zjc664656505 commented 1 year ago

I think I have solved the issue:

Here is my updated code:

For serialization:

std::vector<char> SerializeTensorVectorToBytes(const std::vector<Ort::Value>& tensors) {
    std::vector<char> bytes;

    size_t numTensors = tensors.size();
    const char* dataPtr = reinterpret_cast<const char*>(&numTensors);
    bytes.insert(bytes.end(), dataPtr, dataPtr + sizeof(size_t));

    for (const auto& tensor : tensors) {
        if (!tensor.IsTensor()) {
            std::cerr << "Skipping non-tensor Ort::Value." << std::endl;
            continue;
        }

        Ort::TensorTypeAndShapeInfo info = tensor.GetTensorTypeAndShapeInfo();
        size_t elementCount = info.GetElementCount();

        // Record the current size of bytes to calculate the size of the added data
        size_t initialSize = bytes.size();

        // Write the tensor type to the bytes
        ONNXTensorElementDataType tensorType = info.GetElementType();
        const char* tensorTypePtr = reinterpret_cast<const char*>(&tensorType);
        bytes.insert(bytes.end(), tensorTypePtr, tensorTypePtr + sizeof(ONNXTensorElementDataType));

        // Get the shape of the tensor
        std::vector<int64_t> shape = info.GetShape();
        size_t numDimensions = shape.size();

        // Write the number of dimensions to the bytes
        const char* numDimensionsPtr = reinterpret_cast<const char*>(&numDimensions);
        bytes.insert(bytes.end(), numDimensionsPtr, numDimensionsPtr + sizeof(size_t));

        // Write each dimension to the bytes
        for (int64_t dimension : shape) {
            const char* dimensionPtr = reinterpret_cast<const char*>(&dimension);
            bytes.insert(bytes.end(), dimensionPtr, dimensionPtr + sizeof(int64_t));
        }

        size_t elementSize;
        // Write the tensor data to the bytes
        switch (tensorType) {
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT: {
                const float* tensorData = tensor.GetTensorData<float>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(float));
                elementSize = sizeof(float);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8: {
                const int8_t* tensorData = tensor.GetTensorData<int8_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int8_t));
                elementSize = sizeof(int8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8: {
                const uint8_t* tensorData = tensor.GetTensorData<uint8_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint8_t));
                elementSize = sizeof(uint8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16: {
                const uint16_t* tensorData = tensor.GetTensorData<uint16_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint16_t));
                elementSize = sizeof(uint16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16: {
                const int16_t* tensorData = tensor.GetTensorData<int16_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int16_t));
                elementSize = sizeof(int16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32: {
                const int32_t* tensorData = tensor.GetTensorData<int32_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int32_t));
                elementSize = sizeof(int32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64: {
                const int64_t* tensorData = tensor.GetTensorData<int64_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int64_t));
                elementSize = sizeof(int64_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL: {
                const bool* tensorData = tensor.GetTensorData<bool>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(bool));
                elementSize = sizeof(bool);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE: {
                const double* tensorData = tensor.GetTensorData<double>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(double));
                elementSize = sizeof(double);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32: {
                const uint32_t* tensorData = tensor.GetTensorData<uint32_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint32_t));
                elementSize = sizeof(uint32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64: {
                const uint64_t* tensorData = tensor.GetTensorData<uint64_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint64_t));
                elementSize = sizeof(uint64_t);
                break;
            }
            default:
                std::cerr << "Unsupported tensor type for serialization: " << tensorType << std::endl;
                break;
        }

        // Calculate the expected size
        size_t expectedSize = sizeof(ONNXTensorElementDataType) // size of the tensor type
                              + sizeof(size_t) // size of the number of dimensions
                              + (sizeof(int64_t) * numDimensions) // size of the tensor shape
                              + (elementSize * elementCount); // size of the tensor data

        // Verify the total size of the serialized data for each tensor
        size_t actualSize = bytes.size() - initialSize;  // size of the added data for the current tensor
        if (actualSize != expectedSize) {
            std::cerr << "Error: Serialized tensor size (" << actualSize
                      << ") does not match expected size (" << expectedSize << ")." << std::endl;
        }
    }
    return bytes;
}

For Deserialization:

std::vector<Ort::Value> DeserializeTensorVectorFromBytes(const std::vector<char>& bytes) {
    std::vector<Ort::Value> tensors;

    const char* dataPtr = bytes.data();

    size_t numTensors = *reinterpret_cast<const size_t*>(dataPtr);
    dataPtr += sizeof(size_t);

    for (size_t i = 0; i < numTensors; ++i) {
        ONNXTensorElementDataType tensorType = *reinterpret_cast<const ONNXTensorElementDataType*>(dataPtr);
        dataPtr += sizeof(ONNXTensorElementDataType);

        size_t numDimensions = *reinterpret_cast<const size_t*>(dataPtr);
        dataPtr += sizeof(size_t);

        std::vector<int64_t> shape(numDimensions);
        size_t elementCount = 1;
        for (size_t j = 0; j < numDimensions; ++j) {
            shape[j] = *reinterpret_cast<const int64_t*>(dataPtr);
            dataPtr += sizeof(int64_t);
            elementCount *= shape[j];
        }

        Ort::AllocatorWithDefaultOptions allocator;
        Ort::Value tensor = Ort::Value::CreateTensor<float>(allocator, shape.data(), numDimensions);

        switch (tensorType) {
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT: {
                tensor = Ort::Value::CreateTensor<float>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<float>(), dataPtr, elementCount * sizeof(float));
                dataPtr += elementCount * sizeof(float);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8: {
                tensor = Ort::Value::CreateTensor<int8_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int8_t>(), dataPtr, elementCount * sizeof(int8_t));
                dataPtr += elementCount * sizeof(int8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8: {
                tensor = Ort::Value::CreateTensor<uint8_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint8_t>(), dataPtr, elementCount * sizeof(uint8_t));
                dataPtr += elementCount * sizeof(uint8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16: {
                tensor = Ort::Value::CreateTensor<uint16_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint16_t>(), dataPtr, elementCount * sizeof(uint16_t));
                dataPtr += elementCount * sizeof(uint16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16: {
                tensor = Ort::Value::CreateTensor<int16_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int16_t>(), dataPtr, elementCount * sizeof(int16_t));
                dataPtr += elementCount * sizeof(int16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32: {
                tensor = Ort::Value::CreateTensor<int32_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int32_t>(), dataPtr, elementCount * sizeof(int32_t));
                dataPtr += elementCount * sizeof(int32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64: {
                tensor = Ort::Value::CreateTensor<int64_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int64_t>(), dataPtr, elementCount * sizeof(int64_t));
                dataPtr += elementCount * sizeof(int64_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL: {
                tensor = Ort::Value::CreateTensor<bool>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<bool>(), dataPtr, elementCount * sizeof(bool));
                dataPtr += elementCount * sizeof(bool);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE: {
                tensor = Ort::Value::CreateTensor<double>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<double>(), dataPtr, elementCount * sizeof(double));
                dataPtr += elementCount * sizeof(double);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32: {
                tensor = Ort::Value::CreateTensor<uint32_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint32_t>(), dataPtr, elementCount * sizeof(uint32_t));
                dataPtr += elementCount * sizeof(uint32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64: {
                tensor = Ort::Value::CreateTensor<uint64_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint64_t>(), dataPtr, elementCount * sizeof(uint64_t));
                dataPtr += elementCount * sizeof(uint64_t);
                break;
            }
            default:
                std::cerr << "Unsupported tensor type for deserialization: " << tensorType << std::endl;
                break;
        }

        tensors.push_back(std::move(tensor));
    }

    return tensors;
}

zjc664656505 commented 1 year ago

May I send a PR to update Onnx codebase for updating the functionality since I think some other people may need it in the future?

zjc664656505 commented 1 year ago

Also, my current approach is not supporting the bfloat16, std::string, 'complex64', complex32 and float16. I may add them later on.

wschin commented 8 months ago

This issue has no update for 6+ months. Please let me close it and feel free to reopen.

microsoft / onnxruntime-training-examples

java.lang.UnsatisfiedLinkError: dlopen failed: library "/Users/junchenzhao/Dist-CPU-Learn/android/distributed_inference_demo/test1/src/main/cpp/lib/libonnxruntime4j_jni.so" not found #154

New Issue - Tensor Conversion between Java and C++