Closed zjc664656505 closed 8 months ago
What is the reason behind using the `libonnxruntime4_jni.so
If you're intending to use java bindings in your application, you could directly leverage the ort Java bindings using the aar
file here: https://central.sonatype.com/artifact/com.microsoft.onnxruntime/onnxruntime-training-android/1.15.1
I'll try to see if I can reproduce the error on my end.
I just tried it on my end and was able to successfully build the application. Not sure what might be going on with your env.
Hi Baiju,
I have solved this issue just this morning and forget to update this issue. My way of solving it is creating an folder under the directory src/main/
like what's shown below:
then load the native libonnxruntime4j_jni.so
library in Mainactivity.kt
like this:
companion object {
// Used to load the 'ortpersonalize' library on application startup.
init {
System.loadLibrary("onnxruntime4j_jni")
System.loadLibrary("distributed_inference_demo")
}
}
The reason behind using it is we want to convert java OnnxTensor
to C++ Ort::Value
tensor.
Currently, we are deploying quantized llm to android device using onnx ~ around 1.2gb per model. However, if we directly create session using the onnxruntime in java environment, it will directly result in out of memory Error due to the jvm memory limitation issue. So, we used the android NDK with onnxruntime C++ API for loading the model and conducting inference to solve this out of memory Error.
We tried to conduct inference directly in the C++ backend by creating a random int64 datatype Ort::Value tensor and it was successfully done. However, the problems are:
We are wondering whether or not there is an API that allows us to do this automatically, or we have to write our own API for doing this?
Please let me know.
Thanks a lot!
Best, Junchen
Hi Junchen.
I am not aware of any API that does this directly. But you can refer to how we do it for our train step function in the java bindings:
Let me know if there are any further questions.
Thanks Baiju. We have fixed the issue. I will close this issue.
Hi Baiju,
I have a further question wishing to ask you if it's possible. I'm currently trying to serialize and deserialize the onnx model output in C++ which is a std::vector<Ort::Value>
object. Currently, I have not found any related C++ API in onnxruntime. If it's possible, may I know whether there are approaches of doing so?
Hi Junchen,
You could get the underlying float buffer from each of the Ort::Values inside the vector. Then you can chose to serialize it however you wish.
Hi Baiju,
Thanks a lot for your help!
I have successfully serialized the tensor vector using float buffer.
Here is my approach of serializing the std::vector<Ort::Value>
:
std::vector<char> SerializeTensorVectorToBytes(const std::vector<Ort::Value>& tensors) {
std::vector<char> bytes;
size_t numTensors = tensors.size();
const char* dataPtr = reinterpret_cast<const char*>(&numTensors);
bytes.insert(bytes.end(), dataPtr, dataPtr + sizeof(size_t));
for (const auto& tensor : tensors) {
if (!tensor.IsTensor()) {
std::cerr << "Skipping non-tensor Ort::Value." << std::endl;
continue;
}
const float* floatArr = tensor.GetTensorData<float>();
Ort::TensorTypeAndShapeInfo info = tensor.GetTensorTypeAndShapeInfo();
size_t elementCount = info.GetElementCount();
// Get the shape of the tensor
std::vector<int64_t> shape = info.GetShape();
size_t numDimensions = shape.size();
const char* elementCountPtr = reinterpret_cast<const char*>(&elementCount);
bytes.insert(bytes.end(), elementCountPtr, elementCountPtr + sizeof(size_t));
const char* tensorDataPtr = reinterpret_cast<const char*>(floatArr);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(float));
// Write the number of dimensions to the bytes
const char* numDimensionsPtr = reinterpret_cast<const char*>(&numDimensions);
bytes.insert(bytes.end(), numDimensionsPtr, numDimensionsPtr + sizeof(size_t));
// Write each dimension to the bytes
for (int64_t dimension : shape) {
const char* dimensionPtr = reinterpret_cast<const char*>(&dimension);
bytes.insert(bytes.end(), dimensionPtr, dimensionPtr + sizeof(int64_t));
}
}
return bytes;
}
I encountered an issue during the deserialization process of the model's output vector, which contains multiple Ort::Values with different data types. To deserialize the data, I need to recreate the Ort::Values using Ort::Value::CreateTensor
At the moment, I'm not sure if there's an alternative way to handle this issue, or if I need to manually handle the data types during deserialization. I'm providing the code snippet of my current deserialization implementation where I'm facing this problem
std::vector<Ort::Value> DeserializeTensorVectorFromBytes(const std::vector<char>& bytes) {
std::vector<Ort::Value> tensors;
const char* dataPtr = bytes.data();
const char* endPtr = bytes.data() + bytes.size();
if (endPtr - dataPtr < sizeof(size_t)) {
std::cerr << "Not enough data to deserialize." << std::endl;
return tensors;
}
size_t numTensors = *reinterpret_cast<const size_t*>(dataPtr);
dataPtr += sizeof(size_t);
for (size_t i = 0; i < numTensors; ++i) {
if (endPtr - dataPtr < sizeof(size_t)) {
std::cerr << "Not enough data to deserialize tensor." << std::endl;
return tensors;
}
size_t elementCount = *reinterpret_cast<const size_t*>(dataPtr);
dataPtr += sizeof(size_t);
if (endPtr - dataPtr < elementCount * sizeof(float)) {
std::cerr << "Not enough data to deserialize tensor data." << std::endl;
return tensors;
}
const float* floatArr = reinterpret_cast<const float*>(dataPtr);
dataPtr += elementCount * sizeof(float);
if (endPtr - dataPtr < sizeof(size_t)) {
std::cerr << "Not enough data to deserialize tensor shape." << std::endl;
return tensors;
}
size_t numDimensions = *reinterpret_cast<const size_t*>(dataPtr);
dataPtr += sizeof(size_t);
if (endPtr - dataPtr < numDimensions * sizeof(int64_t)) {
std::cerr << "Not enough data to deserialize tensor shape." << std::endl;
return tensors;
}
std::vector<int64_t> shape(numDimensions);
for (size_t j = 0; j < numDimensions; ++j) {
shape[j] = *reinterpret_cast<const int64_t*>(dataPtr);
dataPtr += sizeof(int64_t);
}
Ort::AllocatorWithDefaultOptions allocator;
// ? indicates where I'm currently stocking with
Ort::Value tensor = Ort::Value::CreateTensor< ? >(allocator, shape.data(), shape.size());
float* tensorData = tensor.GetTensorMutableData< ? >();
std::copy(floatArr, floatArr + elementCount, tensorData);
tensors.push_back(std::move(tensor));
}
return tensors;
}
Any suggestions or insights on how to efficiently handle the data types during deserialization would be greatly appreciated. Thank you!
I think I have solved the issue:
Here is my updated code:
For serialization:
std::vector<char> SerializeTensorVectorToBytes(const std::vector<Ort::Value>& tensors) {
std::vector<char> bytes;
size_t numTensors = tensors.size();
const char* dataPtr = reinterpret_cast<const char*>(&numTensors);
bytes.insert(bytes.end(), dataPtr, dataPtr + sizeof(size_t));
for (const auto& tensor : tensors) {
if (!tensor.IsTensor()) {
std::cerr << "Skipping non-tensor Ort::Value." << std::endl;
continue;
}
Ort::TensorTypeAndShapeInfo info = tensor.GetTensorTypeAndShapeInfo();
size_t elementCount = info.GetElementCount();
// Record the current size of bytes to calculate the size of the added data
size_t initialSize = bytes.size();
// Write the tensor type to the bytes
ONNXTensorElementDataType tensorType = info.GetElementType();
const char* tensorTypePtr = reinterpret_cast<const char*>(&tensorType);
bytes.insert(bytes.end(), tensorTypePtr, tensorTypePtr + sizeof(ONNXTensorElementDataType));
// Get the shape of the tensor
std::vector<int64_t> shape = info.GetShape();
size_t numDimensions = shape.size();
// Write the number of dimensions to the bytes
const char* numDimensionsPtr = reinterpret_cast<const char*>(&numDimensions);
bytes.insert(bytes.end(), numDimensionsPtr, numDimensionsPtr + sizeof(size_t));
// Write each dimension to the bytes
for (int64_t dimension : shape) {
const char* dimensionPtr = reinterpret_cast<const char*>(&dimension);
bytes.insert(bytes.end(), dimensionPtr, dimensionPtr + sizeof(int64_t));
}
size_t elementSize;
// Write the tensor data to the bytes
switch (tensorType) {
case ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT: {
const float* tensorData = tensor.GetTensorData<float>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(float));
elementSize = sizeof(float);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8: {
const int8_t* tensorData = tensor.GetTensorData<int8_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int8_t));
elementSize = sizeof(int8_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8: {
const uint8_t* tensorData = tensor.GetTensorData<uint8_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint8_t));
elementSize = sizeof(uint8_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16: {
const uint16_t* tensorData = tensor.GetTensorData<uint16_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint16_t));
elementSize = sizeof(uint16_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16: {
const int16_t* tensorData = tensor.GetTensorData<int16_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int16_t));
elementSize = sizeof(int16_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32: {
const int32_t* tensorData = tensor.GetTensorData<int32_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int32_t));
elementSize = sizeof(int32_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64: {
const int64_t* tensorData = tensor.GetTensorData<int64_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int64_t));
elementSize = sizeof(int64_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL: {
const bool* tensorData = tensor.GetTensorData<bool>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(bool));
elementSize = sizeof(bool);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE: {
const double* tensorData = tensor.GetTensorData<double>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(double));
elementSize = sizeof(double);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32: {
const uint32_t* tensorData = tensor.GetTensorData<uint32_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint32_t));
elementSize = sizeof(uint32_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64: {
const uint64_t* tensorData = tensor.GetTensorData<uint64_t>();
const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint64_t));
elementSize = sizeof(uint64_t);
break;
}
default:
std::cerr << "Unsupported tensor type for serialization: " << tensorType << std::endl;
break;
}
// Calculate the expected size
size_t expectedSize = sizeof(ONNXTensorElementDataType) // size of the tensor type
+ sizeof(size_t) // size of the number of dimensions
+ (sizeof(int64_t) * numDimensions) // size of the tensor shape
+ (elementSize * elementCount); // size of the tensor data
// Verify the total size of the serialized data for each tensor
size_t actualSize = bytes.size() - initialSize; // size of the added data for the current tensor
if (actualSize != expectedSize) {
std::cerr << "Error: Serialized tensor size (" << actualSize
<< ") does not match expected size (" << expectedSize << ")." << std::endl;
}
}
return bytes;
}
For Deserialization:
std::vector<Ort::Value> DeserializeTensorVectorFromBytes(const std::vector<char>& bytes) {
std::vector<Ort::Value> tensors;
const char* dataPtr = bytes.data();
size_t numTensors = *reinterpret_cast<const size_t*>(dataPtr);
dataPtr += sizeof(size_t);
for (size_t i = 0; i < numTensors; ++i) {
ONNXTensorElementDataType tensorType = *reinterpret_cast<const ONNXTensorElementDataType*>(dataPtr);
dataPtr += sizeof(ONNXTensorElementDataType);
size_t numDimensions = *reinterpret_cast<const size_t*>(dataPtr);
dataPtr += sizeof(size_t);
std::vector<int64_t> shape(numDimensions);
size_t elementCount = 1;
for (size_t j = 0; j < numDimensions; ++j) {
shape[j] = *reinterpret_cast<const int64_t*>(dataPtr);
dataPtr += sizeof(int64_t);
elementCount *= shape[j];
}
Ort::AllocatorWithDefaultOptions allocator;
Ort::Value tensor = Ort::Value::CreateTensor<float>(allocator, shape.data(), numDimensions);
switch (tensorType) {
case ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT: {
tensor = Ort::Value::CreateTensor<float>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<float>(), dataPtr, elementCount * sizeof(float));
dataPtr += elementCount * sizeof(float);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8: {
tensor = Ort::Value::CreateTensor<int8_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<int8_t>(), dataPtr, elementCount * sizeof(int8_t));
dataPtr += elementCount * sizeof(int8_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8: {
tensor = Ort::Value::CreateTensor<uint8_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<uint8_t>(), dataPtr, elementCount * sizeof(uint8_t));
dataPtr += elementCount * sizeof(uint8_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16: {
tensor = Ort::Value::CreateTensor<uint16_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<uint16_t>(), dataPtr, elementCount * sizeof(uint16_t));
dataPtr += elementCount * sizeof(uint16_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16: {
tensor = Ort::Value::CreateTensor<int16_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<int16_t>(), dataPtr, elementCount * sizeof(int16_t));
dataPtr += elementCount * sizeof(int16_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32: {
tensor = Ort::Value::CreateTensor<int32_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<int32_t>(), dataPtr, elementCount * sizeof(int32_t));
dataPtr += elementCount * sizeof(int32_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64: {
tensor = Ort::Value::CreateTensor<int64_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<int64_t>(), dataPtr, elementCount * sizeof(int64_t));
dataPtr += elementCount * sizeof(int64_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL: {
tensor = Ort::Value::CreateTensor<bool>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<bool>(), dataPtr, elementCount * sizeof(bool));
dataPtr += elementCount * sizeof(bool);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE: {
tensor = Ort::Value::CreateTensor<double>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<double>(), dataPtr, elementCount * sizeof(double));
dataPtr += elementCount * sizeof(double);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32: {
tensor = Ort::Value::CreateTensor<uint32_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<uint32_t>(), dataPtr, elementCount * sizeof(uint32_t));
dataPtr += elementCount * sizeof(uint32_t);
break;
}
case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64: {
tensor = Ort::Value::CreateTensor<uint64_t>(allocator, shape.data(), numDimensions);
std::memcpy(tensor.GetTensorMutableData<uint64_t>(), dataPtr, elementCount * sizeof(uint64_t));
dataPtr += elementCount * sizeof(uint64_t);
break;
}
default:
std::cerr << "Unsupported tensor type for deserialization: " << tensorType << std::endl;
break;
}
tensors.push_back(std::move(tensor));
}
return tensors;
}
May I send a PR to update Onnx codebase for updating the functionality since I think some other people may need it in the future?
Also, my current approach is not supporting the bfloat16
, std::string
, 'complex64', complex32
and float16
. I may add them later on.
This issue has no update for 6+ months. Please let me close it and feel free to reopen.
Dear ONNX community,
Recently, I'm trying to build my app for on-device training. I followed the procedure in the example but made modifications in the CmakeList.txt file.
Here is my CmakeList file code:
As you can see, I add the
libonnxruntime4j_jni.so
as additional library in my app. Thelibonnxruntime4j_jni.so
file is saved under mycpp/lib/
directory.However, when I build my app, the error constantly prompts up shown below:
If it's possible, could anyone help me with solving this issue?