[Web] Memory access out of bounds / alignment fault

Describe the issue

Hello,

I am exploring the use of ONNX, with a particular focus on the ORT model format for web applications. I developed a basic WASM module to perform inference using a UNET-like semantic segmentation model. However, the inference process throws an exception, which I have detailed below. Please note that the same code runs without issues outside of the WASM module.

I built the ONNX runtime for web with the following command:

./build.sh --config Release --build_wasm_static_lib --minimal_build --skip_tests --disable_wasm_exception_catching --disable_rtti

I built the WASM module with the following command:

emcc -g myModule.cpp -o myModule.js -I<opencv headers> -I<onnxruntime headers> -L<opencv lib> -L<onnxruntime lib> -lopencv_core -lopencv_imgproc -lonnxruntime_webassembly -s INITIAL_MEMORY=256MB -s EXPORTED_FUNCTIONS="['_processImage', '_malloc', '_free']" -s EXPORTED_RUNTIME_METHODS="['ccall', 'cwrap']" -s SAFE_HEAP=0 --bind

When running the inference I get the following error:

2024-07-15 18:48:24.453600 [I:onnxruntime:, inference_session.cc:514 TraceSessionOptions] Session Options {  execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath: enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:2 intra_op_param:OrtThreadPoolParams { thread_pool_size: 1 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: {  } }
2024-07-15 18:48:24.454500 [I:onnxruntime:, inference_session.cc:414 operator()] Flush-to-zero and denormal-as-zero are off
2024-07-15 18:48:24.454600 [I:onnxruntime:, inference_session.cc:422 ConstructorCommon] Creating and using per session threadpools since use_per_session_threads_ is true
2024-07-15 18:48:24.454800 [I:onnxruntime:, inference_session.cc:440 ConstructorCommon] Dynamic block base set to 0
2024-07-15 18:48:24.462000 [I:onnxruntime:, inference_session.cc:1583 Initialize] Initializing session.
2024-07-15 18:48:24.462100 [I:onnxruntime:, inference_session.cc:1620 Initialize] Adding default CPU execution provider.
2024-07-15 18:48:24.485000 [V:onnxruntime:, session_state.cc:126 CreateGraphInfo] SaveMLValueNameIndexMapping
2024-07-15 18:48:24.485500 [V:onnxruntime:, session_state.cc:172 CreateGraphInfo] Done saving OrtValue mappings.
2024-07-15 18:48:24.488600 [I:onnxruntime:, session_state_utils.cc:201 SaveInitializedTensors] Saving initialized tensors.
2024-07-15 18:48:24.489500 [I:onnxruntime:, session_state_utils.cc:345 SaveInitializedTensors] Done saving initialized tensors
2024-07-15 18:48:24.491300 [I:onnxruntime:, inference_session.cc:1969 Initialize] Session successfully initialized.

With SAFE_HEAP=0:

RuntimeError: memory access out of bounds
    at myModule.wasm.MlasSgemmOperation(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, unsigned long, unsigned long, unsigned long, float, float const*, unsigned long, float const*, unsigned long, float, float*, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[9520]:0x24bf4a)
    at myModule.wasm.MlasConvOperation(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, unsigned long, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[10013]:0x2921cc)
    at myModule.wasm.MlasConv(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, onnxruntime::concurrency::ThreadPool*) (http://localhost:8000/myModule.wasm:wasm-function[9978]:0x28c58a)
    at myModule.wasm.onnxruntime::Conv<float>::Compute(onnxruntime::OpKernelContext*) const (http://localhost:8000/myModule.wasm:wasm-function[9965]:0x288f4c)
    at myModule.wasm.onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (http://localhost:8000/myModule.wasm:wasm-function[7213]:0x1a3e82)
    at myModule.wasm.onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[7221]:0x1a6931)
    at myModule.wasm.onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<int const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, onnxruntime::logging::Logger const&, bool const&, bool, bool) (http://localhost:8000/myModule.wasm:wasm-function[6719]:0x152035)
    at myModule.wasm.onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, gsl::span<OrtValue const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool, onnxruntime::Stream*) (http://localhost:8000/myModule.wasm:wasm-function[6718]:0x14f436)
    at myModule.wasm.onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>*, std::__2::vector<OrtDevice, std::__2::allocator<OrtDevice>> const*) (http://localhost:8000/myModule.wasm:wasm-function[17805]:0x6f648b)
    at myModule.wasm.onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 4294967295ul>, gsl::span<OrtValue const* const, 4294967295ul>, gsl::span<char const* const, 4294967295ul>, gsl::span<OrtValue*, 4294967295ul>) (http://localhost:8000/myModule.wasm:wasm-function[5392]:0xf1758)

With SAFE_HEAP=1:

RuntimeError: Aborted(alignment fault)
    at abort (http://localhost:8000/myModule.js:625:41)
    at alignfault (http://localhost:8000/myModule.js:354:3)
    at myModule.wasm (http://localhost:8000/myModule.wasm:wasm-function[17477]:0x862651)
    at myModule.wasm.MlasConvIm2Col(MLAS_CONV_PARAMETERS const*, float const*, float*, unsigned long, unsigned long, unsigned long, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[9072]:0x2e0492)
    at myModule.wasm.MlasConvOperation(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, unsigned long, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[9074]:0x2e119c)
    at myModule.wasm.MlasConv(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, onnxruntime::concurrency::ThreadPool*) (http://localhost:8000/myModule.wasm:wasm-function[9039]:0x2da858)
    at myModule.wasm.onnxruntime::Conv<float>::Compute(onnxruntime::OpKernelContext*) const (http://localhost:8000/myModule.wasm:wasm-function[9026]:0x2d67b0)
    at myModule.wasm.onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (http://localhost:8000/myModule.wasm:wasm-function[6274]:0x1c14f5)
    at myModule.wasm.onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[6282]:0x1c49bf)
    at myModule.wasm.onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<int const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, onnxruntime::logging::Logger const&, bool const&, bool, bool) (http://localhost:8000/myModule.wasm:wasm-function[5780]:0x15fa3a)

Best regards, Mario

To reproduce

Here is the code I used to test the ORT model:

extern "C" {
EMSCRIPTEN_KEEPALIVE
void processImage(const uint8_t* inputImageData, size_t inputImageDataSize, uint8_t* outputImageData, int width, int height) {
    cv::Mat image(height, width, CV_8UC4, const_cast<uint8_t*>(inputImageData));
    cv::Mat rgbImage;
    cv::cvtColor(image, rgbImage, cv::COLOR_BGRA2RGB);
    cv::Mat resizedImage;
    cv::resize(rgbImage, resizedImage, cv::Size(256, 256), 0, 0, cv::INTER_AREA);
    cv::Mat f32Image;
    resizedImage.convertTo(f32Image, CV_32F, 1.0 / 255);
    //
    std::vector<float> inputData;
    inputData.assign((float *) f32Image.datastart, (float *) f32Image.dataend);
    //
    Ort::SessionOptions session_options;
    session_options.SetIntraOpNumThreads(1);
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
    // Decode the base64 model
    std::vector<uint8_t> model_data = base64_decode(base64_model);
    // Load the model from memory
    Ort::Env env(ORT_LOGGING_LEVEL_VERBOSE, "test");
    Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    Ort::AllocatorWithDefaultOptions allocator;
    Ort::Session session(env, model_data.data(), model_data.size(), session_options);
    // input tensor
    std::vector<int64_t> inputShape = {1, 256, 256, 3};
    Ort::Value inputTensor = Ort::Value::CreateTensor<float>(memory_info, inputData.data(), inputData.size(),
                                                             inputShape.data(), inputShape.size());
    // output tensor
    std::vector<float> outputData(256 * 256 * 4);
    std::vector<int64_t> outputShape = {1, 256, 256, 1};
    Ort::Value outputTensor = Ort::Value::CreateTensor<float>(memory_info,
                                                              outputData.data(), outputData.size(),
                                                              outputShape.data(), outputShape.size());

    auto input_name_alloc = session.GetInputNameAllocated(0, allocator);
    const char *input_name = input_name_alloc.get();
    auto output_name_alloc = session.GetOutputNameAllocated(0, allocator);
    const char *output_name = output_name_alloc.get();

    // Run inference
    session.Run(Ort::RunOptions{nullptr}, &input_name, &inputTensor, 1, &output_name, &outputTensor, 1);

    // Process output tensor
    auto *float_array = outputTensor.GetTensorMutableData<float>();

    // Convert the output tensor to cv::Mat
    cv::Mat outputImg(256, 256, CV_32FC1, float_array);
    outputImg.convertTo(outputImg, CV_8UC1, 255.0);
    cv::resize(outputImg, outputImg, cv::Size(width, height));
    // Copy the output image to the outputImageData buffer
    std::memcpy(outputImageData, outputImg.data, width * height);
}
}

Urgency

No response

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

v1.17.1

Execution Provider

'wasm'/'cpu' (WebAssembly CPU)

Hi @YoniGBinahAi, I noticed that your build command includes thread support. Are you actually enabling ORT's multi-threading mode? In my case, enabling multi-threading always results in the following error:

RuntimeError: table index is out of bounds
at myModule.wasm.onnxruntime::concurrency::ThreadPool::DegreeOfParallelism(onnxruntime::concurrency::ThreadPool const*) (https://localhost:8000/myModule.wasm:wasm-function[7272]:0x2a4f94)
at myModule.wasm.MlasConvPrepare(MLAS_CONV_PARAMETERS*, unsigned long, unsigned long, unsigned long, unsigned long, long long const*, long long const*, long long const*, long long const*, long long const*, long long const*, unsigned long, MLAS_ACTIVATION const*, unsigned long*, float, onnxruntime::concurrency::ThreadPool*) (https://localhost:8000/myModule.wasm:wasm-function[11360]:0x46ba65)
at myModule.wasm.onnxruntime::Conv<float>::Compute(onnxruntime::OpKernelContext*) const (https://localhost:8000/myModule.wasm:wasm-function[11350]:0x4684de)
at myModule.wasm.onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (https://localhost:8000/myModule.wasm:wasm-function[8572]:0x33e190)
at myModule.wasm.onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (https://localhost:8000/myModule.wasm:wasm-function[8581]:0x3418af)
at myModule.wasm.onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<int const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, onnxruntime::logging::Logger const&, bool const&, bool, bool) (https://localhost:8000/myModule.wasm:wasm-function[8141]:0x2d9b2a)
at myModule.wasm.onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, gsl::span<OrtValue const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool, onnxruntime::Stream*) (https://localhost:8000/myModule.wasm:wasm-function[8140]:0x2d704f)
at myModule.wasm.onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, ExecutionMode, OrtRunOptions const&, onnxruntime::logging::Logger const&) (https://localhost:8000/myModule.wasm:wasm-function[8145]:0x2db33c)
at myModule.wasm.onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>*, std::__2::vector<OrtDevice, std::__2::allocator<OrtDevice>> const*) (https://localhost:8000/myModule.wasm:wasm-function[19658]:0xa5a58a)
at myModule.wasm.onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 4294967295ul>, gsl::span<OrtValue const* const, 4294967295ul>, gsl::span<char const* const, 4294967295ul>, gsl::span<OrtValue*, 4294967295ul>) (https://localhost:8000/myModule.wasm:wasm-function[7098]:0x288eaa)

Here is how I create the session:

int num_threads = 2;
OrtThreadingOptions *tp_options;
Ort::GetApi().CreateThreadingOptions(&tp_options);
Ort::GetApi().SetGlobalIntraOpNumThreads(tp_options, num_threads);
Ort::GetApi().SetGlobalInterOpNumThreads(tp_options, 1);
Ort::GetApi().SetGlobalSpinControl(tp_options, 0);
OrtEnv *g_env;
Ort::GetApi().CreateEnvWithGlobalThreadPools(ORT_LOGGING_LEVEL_WARNING, "Default", tp_options, &g_env);

Ort::SessionOptions sessionOptions;
sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
sessionOptions.DisableCpuMemArena();
sessionOptions.DisableMemPattern();
sessionOptions.DisableProfiling();
sessionOptions.DisablePerSessionThreads();
sessionOptions.SetExecutionMode(ORT_SEQUENTIAL);
sessionOptions.SetIntraOpNumThreads(num_threads);
sessionOptions.SetInterOpNumThreads(1);
sessionOptions.AddConfigEntry("session.load_model_format", "ORT");
sessionOptions.AddConfigEntry("session.use_ort_model_bytes_directly", "1");
sessionOptions.AddConfigEntry("session.intra_op.allow_spinning", "0");
sessionOptions.AddConfigEntry("session.inter_op.allow_spinning", "0");

session_ = Ort::Session(Ort::Env(g_env), modelData.data(), modelData.size(), sessionOptions);

It works fine when I set num_threads to 1.

I’d appreciate any feedback.

Thank you, Mario

microsoft / onnxruntime