Open mlruns opened 1 year ago
Please, ensure correct usage of ORT API, make sure that when you create tensors, you pass buffer lengths either in number elements or in bytes, as documented (common mistake).
It does not look it is crashing. My understanding Run() returns an error. You are running out of memory. If you are running on CPU, you may want to disable memory arena, you only need this for GPU.
Thanks for your response. As I am running on CPU, I tried using disable memory arena in session options . I got this error
Message:
2023-06-11 23:53:14.1898754 [ onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running DynamicQuantizeMatMul node. Name:'/image_encoder/2/attn/MatMul_quant' Status Message: bad allocation unknown file: error: C++ exception with description "Non-zero status code returned while running DynamicQuantizeMatMul node. Name:'/image_encoder/2/attn/MatMul_quant' Status Message: bad allocation" thrown in the test body.
Stack Trace: , sequential_executor.cc:368 line 368
std::vector<SEGMENT_RESULT> run_SAM_ONNX_model_on_image
(const SharedClasses::CLynxImage& image, const std::string& encoder_ONNX_filename, const std::string& decoder_ONNX_filename, int model_in_x, int model_in_y, RECT bbox, int cls)
{
//std::vector<SEGMENT_RESULT> output;
//cv::setNumThreads(0);
// We use ORT_API_MANUAL_INIT to allow for delay-loading the OnnxRuntime dll.
// It's unclear whether its safe to just blindly call InitApi() every time it might be required;
// for now, test the (private) global api_ pointer to make sure.
if (!Ort::Global<void>::api_)
{
Ort::InitApi();
}
SharedClasses::CLynxImage working_copy;
image.copyTo(working_copy);
cv::Mat cv_image;
link_lynx_to_CV_mat(working_copy, cv_image);
cv::cvtColor(cv_image, cv_image, cv::COLOR_GRAY2RGB);
int EncoderInputSize = 1024;
cv::Mat resized_image = ResizeLongestSide_apply_image(cv_image, EncoderInputSize);
int pad_h = EncoderInputSize - resized_image.rows;
int pad_w = EncoderInputSize - resized_image.cols;
cv::Mat padded_image;
cv::copyMakeBorder(resized_image, padded_image, 0, pad_h, 0, pad_w, cv::BorderTypes::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
std::vector<SEGMENT_RESULT> output;
// setting up onnxruntime env
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "example-model-explorer");
//std::vector<int64_t> EncoderOutputShape, EncoderInputShape;
Ort::AllocatorWithDefaultOptions allocator;
Ort::MemoryInfo memory_info_handler = Ort::MemoryInfo::CreateCpu(
OrtArenaAllocator, OrtMemTypeDefault
);
#ifdef ORTCHAR_T
std::basic_string<ORTCHAR_T> encoder_model_file = std::basic_string<ORTCHAR_T>(encoder_ONNX_filename.begin(), encoder_ONNX_filename.end());
std::basic_string<ORTCHAR_T> decoder_model_file = std::basic_string<ORTCHAR_T>(decoder_ONNX_filename.begin(), decoder_ONNX_filename.end());
#else
auto& model_file = encoder_ONNX_filename;
auto& model_file = decoder_ONNX_filename;
#endif
Ort::SessionOptions session_options;
session_options.SetInterOpNumThreads(1).SetIntraOpNumThreads(1);
session_options.DisableCpuMemArena();
//session_options.SetInterOpNumThreads(std::thread::hardware_concurrency());
//session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
std::unique_ptr <Ort::Session> encoder_session = std::make_unique <Ort::Session>(env, encoder_model_file.c_str(), session_options);
if (encoder_session->GetInputCount() != 1 || encoder_session->GetOutputCount() != 1) {
auto a = 1;
}
auto EncoderOutputShape = encoder_session->GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape();
auto EncoderInputShape = encoder_session->GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape();
//EncoderInputShape = std::vector<int64_t>{1,3, 1024, 1024 };
//auto EncoderInputShape = encoder_session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape();
//resize before blob for python/c++ reproducability
Ort::Session decoder_session = Ort::Session(env, decoder_model_file.c_str(), session_options);
//std::vector<uint8_t> inputTensorValues(EncoderInputShape[0] * EncoderInputShape[1] * EncoderInputShape[2] *
// EncoderInputShape[3]);
if (padded_image.size() != cv::Size(EncoderInputShape[3], EncoderInputShape[2])) {
//std::cerr << "Image size not match" << std::endl;
//std::cout << "Image width : " << Image.cols << " Image height : " << Image.rows << std::endl;
//return output;
}
if (padded_image.channels() != 3) {
//std::cerr << "Input image is not a 3-channel image" << std::endl;
//return output;
}
auto blob = cv::dnn::blobFromImage(padded_image, 1.0, cv::Size(EncoderInputShape[3], EncoderInputShape[2]));
std::vector<uint8_t> inputTensorValues(blob.total());
inputTensorValues.assign((uint8_t*)blob.data, (uint8_t*)blob.data + blob.total() * blob.channels());
std::vector<Ort::Value> inputTensor;
Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(
OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
inputTensor.push_back(Ort::Value::CreateTensor<uint8_t>(memoryInfo, inputTensorValues.data(), inputTensorValues.size(), EncoderInputShape.data(), EncoderInputShape.size()));
std::vector<float>image_embedding = std::vector<float>(EncoderOutputShape[0] * EncoderOutputShape[1] * EncoderOutputShape[2] * EncoderOutputShape[3]);
auto outputTensorPre = Ort::Value::CreateTensor<float>(
memory_info_handler, image_embedding.data(), image_embedding.size(),
EncoderOutputShape.data(), EncoderOutputShape.size());
assert(outputTensorPre.IsTensor() && outputTensorPre.HasValue());
//const char* inputNamesPre[] = { ""}, * outputNamesPre[] = {"output"};
auto* inputName = encoder_session->GetInputName(0, allocator);
auto* outputName = encoder_session->GetOutputName(0, allocator);
const char* inputNames[] = { inputName };
const char* outputNames[] = { outputName };
Ort::RunOptions run_options;
run_options.SetRunLogVerbosityLevel(1);
//encoder_session->Run(run_options, inputNames, &inputTensor, 1, outputNames, &outputTensorPre,
// 1);
auto output_tensors = encoder_session->Run(Ort::RunOptions{ nullptr }, inputNames, inputTensor.data(), inputTensor.size(), outputNames, 1);
//running decoder session
const char* DecoderInputNames[6]{ "image_embeddings", "point_coords", "point_labels",
"mask_input", "has_mask_input", "orig_im_size" },
* DecoderOutputNames[3]{ "masks", "iou_predictions", "low_res_masks" };
float inputPointsValues[] = { bbox.left,bbox.right,bbox.top,bbox.bottom };
float inputLabelsValues[] = {cls };
const size_t maskInputSize = 256 * 256;
float maskInputValues[maskInputSize], hasMaskValues[] = { 0 },
orig_im_size_values[] = { (float)cv_image.rows, (float)cv_image.cols };
memset(maskInputValues, 0, sizeof(maskInputValues));
std::vector<int64_t> inputPointShape = { 1, 3, 2 }, pointLabelsShape = { 1, 3 },
maskInputShape = { 1, 1, 256, 256 }, hasMaskInputShape = { 1 },
origImSizeShape = { 2 };
std::vector<Ort::Value> inputTensorsSam;
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
memory_info_handler, (float*)image_embedding.data(), image_embedding.size(),
EncoderOutputShape.data(), EncoderOutputShape.size()));
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
memory_info_handler, inputPointsValues, 2 * 3, inputPointShape.data(), inputPointShape.size()));
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
memory_info_handler, inputLabelsValues, 1 * 3, pointLabelsShape.data(), pointLabelsShape.size()));
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
memory_info_handler, maskInputValues, maskInputSize, maskInputShape.data(), maskInputShape.size()));
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
memory_info_handler, hasMaskValues, 1, hasMaskInputShape.data(), hasMaskInputShape.size()));
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
memory_info_handler, orig_im_size_values, 2, origImSizeShape.data(), origImSizeShape.size()));
Ort::RunOptions runOptionsSam;
auto DecoderOutputTensors = decoder_session.Run(runOptionsSam, DecoderInputNames, inputTensorsSam.data(),
inputTensorsSam.size(), DecoderOutputNames, 3);
auto masks = DecoderOutputTensors[0].GetTensorMutableData<float>();
auto iou_predictions = DecoderOutputTensors[1].GetTensorMutableData<float>();
auto low_res_masks = DecoderOutputTensors[2].GetTensorMutableData<float>();
Ort::Value& masks_ = DecoderOutputTensors[0];
Ort::Value& iou_predictions_ = DecoderOutputTensors[1];
Ort::Value& low_res_masks_ = DecoderOutputTensors[2];
auto mask_dims = masks_.GetTypeInfo().GetTensorTypeAndShapeInfo().GetShape();
auto iou_pred_dims = iou_predictions_.GetTypeInfo().GetTensorTypeAndShapeInfo().GetShape();
auto low_res_dims = low_res_masks_.GetTypeInfo().GetTensorTypeAndShapeInfo().GetShape();
const unsigned int Resizemasks_batch = mask_dims.at(0);
const unsigned int Resizemasks_nums = mask_dims.at(1);
const unsigned int Resizemasks_width = mask_dims.at(2);
const unsigned int Resizemasks_height = mask_dims.at(3);
//std::vector<SEGMENT_RESULT> output;
for (unsigned int index = 0; index < Resizemasks_nums; index++)
{
//cv::Mat mask(cv_image.rows, cv_image.cols, CV_8UC1);
std::vector<std::vector<unsigned char>> mask;
for (unsigned int i = 0; i < cv_image.rows; i++)
{
for (unsigned int j = 0; j < cv_image.cols; j++)
{
mask[i][j] = masks[i * cv_image.cols + j + index * cv_image.rows * cv_image.cols] > 0 ? 255 : 0;
}
}
SEGMENT_RESULT mat_info;
mat_info.mask = mask;
mat_info.iou_pred = *(iou_predictions++);
output.emplace_back(mat_info);
}
return output;
}
ORT had a size calculation overflow bug in one of the scenarios, you may try to your code with 1.15.1 that was just released.
From the usage perspective, I am curious to know what makes you allocate Ort::Session on the heap while everything else on the stack?
ORT had a size calculation overflow bug in one of the scenarios, you may try to your code with 1.15.1 that was just released.
May I ask which commit (with the fix for the bug you describe) are you referring to here - or at least which version of ONNX the fix was in? Because I also have an issue that seems related (I did not submit it just yet, the gist is that SafeInt overflows when computing the shape of a tensor - very rarely - I am on ONNX 1.13.1). Knowing the commit with the fix would let me know whether the issue I am looking at has already been resolved. Thanks a lot in advance.
@yuslepukhin One more bit of data related to the issue I am having - which might be the same issue the original reporter is having - having happened during one run of the model the overflow in SafeInt seems to poison subsequent runs. I am guessing this might be related to the enable_mem_reuse flag, which is on by default.
@yuslepukhin One more bit of data related to the issue I am having - which might be the same issue the original reporter is having - having happened during one run of the model the overflow in SafeInt seems to poison subsequent runs. I am guessing this might be related to the enable_mem_reuse flag, which is on by default.
SafeInt throws an exception. It is possible, some code is not written to provide strong exception safety guarantees.
Either way, I would need the actual model to look into it, if that is available. Also, generating some random input in the sample to avoid deps would be helpful.
Any fixes would go be made to the next release, so I strongly recommend trying the most recently released version. 1.16
will be out soon.
I think this might be related: https://github.com/microsoft/onnxruntime/commit/c424e42594d92daba54f264c1c7409e53529d933
Describe the issue
[ onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running FusedMatMul node. Name:'MatMul_With_Transpose_token_14_FusedMatMulAndScale' Status Message: bad allocation unknown file: error: C++ exception with description "Non-zero status code returned while running FusedMatMul node. Name:'MatMul_With_Transpose_token_14_FusedMatMulAndScale' Status Message: bad allocation" thrown in the test body.
To reproduce
encoder_session->Run(run_options, inputNames, &inputTensor, 1, outputNames, &outputTensorPre, 1);
encoder.session->Run is crashing and giving that error.
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.10.0
ONNX Runtime API
Python
Architecture
Other / Unknown
Execution Provider
Default CPU
Execution Provider Library Version
CUDA 11.7