halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.9k stars 1.07k forks source link

Possible memory corruption running apps/blur for HVX #6440

Closed steven-johnson closed 2 years ago

steven-johnson commented 2 years ago

Running the apps/blur test on my Linux box (x86-64) crashes reliably if I build for HL_TARGET=host-hvx-*san (any of msan, asan, tsan) and run under the simulator. Running without sanitizers also reports a suspicious memory warning.

This is building Halide with HVX SDK 4.3.0 at Halide commit 57d1e0578018a8da0611b0318a1d2c081af2b756 (today's top-of-tree for Halide) and a fresh build of LLVM 14 (ie today's top-of-tree).

The specific command-lines and resulting failures I see are listed below. (In all cases, assume export LD_LIBRARY_PATH=$HOME/GitHub/Halide/src/runtime/hexagon_remote/bin/host:$HEXAGON_TOOLS_ROOT/Tools/lib/iss:$LD_LIBRARY_PATH has been done already.)

(No sanitizers):

$ make clean && HL_TARGET=host-hvx make -j72 test

times: 0.002853 0.000159 0.110713
Success!
free(): invalid next size (fast)
make: *** [Makefile:33: test] Aborted
ASAN:

$ make clean && HL_TARGET=host-hvx-asan make -j72 test

=================================================================
==3737711==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000048 at pc 0x7f0fcdc0d8a6 bp 0x7fff85f7ed00 sp 0x7fff85f7e4b0
WRITE of size 73 at 0x602000000048 thread T0
    #0 0x7f0fcdc0d8a5 in __interceptor_readlink ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:7200
    #1 0x7f0fa5ef2b79 in HexagonWrapper::Init() (/usr/local/google/home/srj/Qualcomm/Hexagon_SDK/3.5.2/tools/HEXAGON_Tools/8.3.07/Tools/lib/iss/libwrapper.so+0xeb79)
    #2 0x7f0fa6103f34 in init_sim() (/usr/local/google/home/srj/GitHub/Halide/src/runtime/hexagon_remote/bin/host/libhalide_hexagon_host.so+0x9f34)
    #3 0x7f0fa6104b97 in halide_hexagon_remote_load_library (/usr/local/google/home/srj/GitHub/Halide/src/runtime/hexagon_remote/bin/host/libhalide_hexagon_host.so+0xab97)
    #4 0x556b60043399 in halide_hexagon_initialize_kernels (/usr/local/google/home/srj/GitHub/Halide/apps/blur/bin/host-hvx-asan/test+0x30399)

0x602000000048 is located 16 bytes to the right of 8-byte region [0x602000000030,0x602000000038)
allocated by thread T0 here:
    #0 0x7f0fcdc6df37 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7f0fa6103f1a in init_sim() (/usr/local/google/home/srj/GitHub/Halide/src/runtime/hexagon_remote/bin/host/libhalide_hexagon_host.so+0x9f1a)
    #2 0x7f0fa6104b97 in halide_hexagon_remote_load_library (/usr/local/google/home/srj/GitHub/Halide/src/runtime/hexagon_remote/bin/host/libhalide_hexagon_host.so+0xab97)
    #3 0x556b60043399 in halide_hexagon_initialize_kernels (/usr/local/google/home/srj/GitHub/Halide/apps/blur/bin/host-hvx-asan/test+0x30399)
    #4 0x556b60044cc9 in halide_blur (/usr/local/google/home/srj/GitHub/Halide/apps/blur/bin/host-hvx-asan/test+0x31cc9)
    #5 0x556b60020909 in blur_halide(Halide::Runtime::Buffer<unsigned short, 4>) (/usr/local/google/home/srj/GitHub/Halide/apps/blur/bin/host-hvx-asan/test+0xd909)

SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:7200 in __interceptor_readlink
TSAN:

$ make clean && HL_TARGET=host-hvx-tsan make -j72 test

(Various irrelevant-and-benign data races are reported, which I omit here)

==================
ThreadSanitizer:DEADLYSIGNAL
==3737938==ERROR: ThreadSanitizer: SEGV on unknown address (pc 0x7f5e6cd60f05 bp 0x7f5e41d9356b sp 0x7ffed983dd40 T3737938)
==3737938==The signal is caused by a READ memory access.
==3737938==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
    #0 _dl_lookup_symbol_x elf/dl-lookup.c:850 (ld-linux-x86-64.so.2+0xaf05)
    #1 do_sym elf/dl-sym.c:153 (libc.so.6+0x137444)
    #2 dlsym_doit dlfcn/dlsym.c:50 (libdl.so.2+0x13b3)
    #3 __GI__dl_catch_exception elf/dl-error-skeleton.c:208 (libc.so.6+0x137b0f)
    #4 __GI__dl_catch_error elf/dl-error-skeleton.c:227 (libc.so.6+0x137bce)
    #5 _dlerror_run dlfcn/dlerror.c:170 (libdl.so.2+0x1a64)
    #6 __dlsym dlfcn/dlsym.c:70 (libdl.so.2+0x141b)
    #7 <null> <null> (libhexagonissv65.so+0x37f48c)
    #8 <null> <null> (libhexagonissv65.so+0x37fa3d)
    #9 <null> <null> (libhalide_hexagon_host.so+0xa3b4)
    #10 <null> <null> (libhalide_hexagon_host.so+0xab97)
    #11 halide_hexagon_initialize_kernels <null> (test+0x1a3e2)

ThreadSanitizer can not provide additional info.
SUMMARY: ThreadSanitizer: SEGV elf/dl-lookup.c:850 in _dl_lookup_symbol_x
MSAN:

# My version of GCC doesn't understand -fsanitize=memory, must use Clang for this test
$ make clean && CC=clang CXX=clang++ SANITIZER_FLAGS="-fsanitize=memory" HL_TARGET=host-hvx-msan make -j72 test

MemorySanitizer:DEADLYSIGNAL
==3738191==ERROR: MemorySanitizer: SEGV on unknown address (pc 0x7f7d94c6af05 bp 0x7f7d92acb56b sp 0x7ffda52ba510 T3738191)
==3738191==The signal is caused by a READ memory access.
==3738191==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
    #0 0x7f7d94c6af05 in _dl_lookup_symbol_x elf/dl-lookup.c:850:13
    #1 0x7f7d947ec444 in do_sym elf/dl-sym.c:153:16
    #2 0x7f7d94c313b3 in dlsym_doit dlfcn/dlsym.c:50:15
    #3 0x7f7d947ecb0f in _dl_catch_exception elf/dl-error-skeleton.c:208:8
    #4 0x7f7d947ecbce in _dl_catch_error elf/dl-error-skeleton.c:227:19
    #5 0x7f7d94c31a64 in _dlerror_run dlfcn/dlerror.c:170:21
    #6 0x7f7d94c3141b in dlsym dlfcn/dlsym.c:70:19
    #7 0x7f7d9287a48c in HexagonWrapper::PluginCosims() (/usr/local/google/home/srj/Qualcomm/Hexagon_SDK/3.5.2/tools/HEXAGON_Tools/8.3.07/Tools/lib/iss/libhexagonissv65.so+0x37f48c)
    #8 0x7f7d9287aa3d in HexagonWrapper::EndOfConfiguration() (/usr/local/google/home/srj/Qualcomm/Hexagon_SDK/3.5.2/tools/HEXAGON_Tools/8.3.07/Tools/lib/iss/libhexagonissv65.so+0x37fa3d)
    #9 0x7f7d9312e3b4 in init_sim() (/usr/local/google/home/srj/GitHub/Halide/src/runtime/hexagon_remote/bin/host/libhalide_hexagon_host.so+0xa3b4)
    #10 0x7f7d9312eb97 in halide_hexagon_remote_load_library (/usr/local/google/home/srj/GitHub/Halide/src/runtime/hexagon_remote/bin/host/libhalide_hexagon_host.so+0xab97)
    #11 0x4b0a3c in halide_hexagon_initialize_kernels (/usr/local/google/home/srj/GitHub/Halide/apps/blur/bin/host-hvx-msan/test+0x4b0a3c)

MemorySanitizer can not provide additional info.
SUMMARY: MemorySanitizer: SEGV elf/dl-lookup.c:850:13 in _dl_lookup_symbol_x
steven-johnson commented 2 years ago

Ugh -- this looks like a completely false positive by me (collision between different HVX SDK versions in different parts of my test). Closing.