microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.24k stars 2.87k forks source link

valgrind memcpy_chk overlap onnxruntime1.15.1 #17431

Open zhaowujie opened 1 year ago

zhaowujie commented 1 year ago

Describe the issue

When I upgraded onnxruntime from 1.13.1 to 1.15.1, I used valgrind to test memory-related issues. valgrind caught memcpy_chk overlap in onnxruntime 1.15.1 as below

==18663== 2 errors in context 2 of 60: ==18663== Source and destination overlap in memcpy_chk(0x1ffeff93e0, 0x1ffeff93e0, 5) ==18663== at 0x4C39660: __memcpy_chk (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==18663== by 0x784A625: ??? (in /home/xxx/bin/libonnxruntime.so.1.15.1) ==18663== by 0x1FFEFF93EF: ??? ==18663== by 0x1FFEFF94DF: ??? ==18663== by 0x784668F: ??? (in /home/xxx/bin/libonnxruntime.so.1.15.1) ==18663== by 0x1FFEFF94EF: ??? ==18663== by 0xA39372D2F: ??? ==18663== ==18663== ==18663== 2 errors in context 3 of 60: ==18663== Source and destination overlap in memcpy_chk(0x1ffeff93f0, 0x1ffeff93f0, 5) ==18663== at 0x4C39660: __memcpy_chk (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==18663== by 0x784A625: ??? (in /home/xxx/bin/libonnxruntime.so.1.15.1) ==18663== by 0x43F19EE54400C0DC: ??? ==18663== by 0x1FFEFF94EF: ??? ==18663== by 0x784666F: ??? (in /home/xxx/bin/libonnxruntime.so.1.15.1)

To reproduce

valgrind --leak-check=full --show-leak-kinds=all --show-possibly-lost=no --show-reachable=no --track-origins=yes \ --verbose --log-file=/path/to/log.txt ${EXE_PATH}

${EXE_PATH} indicates the executable that invoke the onnxruntime

Urgency

No response

Platform

Linux

OS Version

Ubuntu 18.04.4 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

skottmckay commented 1 year ago

We would need the model and exact input used to execute the model to try and reproduce the issue. Can you please provide these?

zhaowujie commented 1 year ago

Hi, skottmckay. It's so wired. I write a simple testing program that invoke onnxruntime, and the problem with valgrind cannot be reproduced when I using the testing program even if I use the same onnx model.

JonathanGirardeau commented 12 months ago

Hello, I encounter the same issue but my valgrind report has more details:

==561223== Source and destination overlap in memcpy_chk(0x1ffeffe6a0, 0x1ffeffe6a0, 5)
==561223==    at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==561223==    by 0xF5B2DB: cpuinfo_linux_parse_cpulist (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0xF5747A: cpuinfo_linux_get_max_possible_processor (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0xF557A6: cpuinfo_x86_linux_init (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0x48644DE: __pthread_once_slow (pthread_once.c:116)
==561223==    by 0xF54B9A: cpuinfo_initialize (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0xCC444B: onnxruntime::(anonymous namespace)::PosixEnv::PosixEnv() [clone .constprop.0] (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0xCC4B94: onnxruntime::Env::Default() (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0x3D2457: onnxruntime::Environment::Initialize(std::unique_ptr<onnxruntime::logging::LoggingManager, std::default_delete<onnxruntime::logging::LoggingManager> >, OrtThreadingOptions const*, bool) (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0x3D42DD: onnxruntime::Environment::Create(std::unique_ptr<onnxruntime::logging::LoggingManager, std::default_delete<onnxruntime::logging::LoggingManager> >, std::unique_ptr<onnxruntime::Environment, std::default_delete<onnxruntime::Environment> >&, OrtThreadingOptions const*, bool) (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0x38D913: OrtEnv::GetInstance(OrtEnv::LoggingManagerConstructionInfo const&, onnxruntime::common::Status&, OrtThreadingOptions const*) (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)
==561223==    by 0x3747FD: OrtApis::CreateEnvWithCustomLogger(void (*)(void*, OrtLoggingLevel, char const*, char const*, char const*, char const*), void*, OrtLoggingLevel, char const*, OrtEnv**) (in /mnt/development/dialogue-controller-cpp/cmake-build-release-linux-x86_64/silero_wav_vad_test)

I am using onnxruntime conan package 1.15.1, and I don't have this error in Debug, only in Release.

HumamHelfawi commented 6 months ago

Did you manage to solve this ? I have been facing this problem for days with random seg fault

JonathanGirardeau commented 5 months ago

No, but I don't have seg fault.