Closed andrewstanfordjason closed 3 years ago
I think I know what's going on, but I don't know whether we should do anything about it.
So, I tried to reproduce this but ultimately failed because apparently address sanitization is not supported on Windows.
From the information provided above, though, the culprit seems to be at line 400 here:
The context of this call is in test_nn_conv2d_hstrip_deep_padded_case0()
. This test case checks whether the nn_conv2d_hstrip_deep_padded()
function works correctly on a 1x1
input image with 36
channels. As it happens, the input image X
is the first object declared in the test case's stack frame (line 85):
Here's what I believe is going on.
Line 400 in nn_conv2d_hstrip_deep_padded()
is loading vC
with data from the input image. Because the vector registers are 32 bytes long, but the input image X
in this case is 36 bytes long, the last 4 bytes of X
are what I call the input channel tail. Line 400 is supposed to be loading the input channel tail, and masking off the garbage at the end. So that's what it's doing -- load the last 4 bytes plus the 28 bytes following X
.
That is the intended behavior. The problem appears to be that, because X
happens to be the first object declared in the stack frame, reading an extra 28 bytes beyond it actually reads beyond the end of the stack frame (and likely beyond the end of the entire stack). It seems to not like that a whole lot.
What to do?
The whole reading-beyond-the-end-of-the-input-image thing is entirely intentional, because we can't load partial vectors into the VPU registers. If the object we're reading from happens to exist at a location in memory that would cause an exception (or similar) if we attempt to read beyond it, we're going to have a problem, whether it's on a Macbook or on xcore. I don't actually expect this to be a problem in a program more complicated than a test case. What I'm uncertain of (which is why I wanted to reproduce it) is whether the crash was just because it overran the stack, or if it was because it overran that individual object. I'm assuming it was only because of overrunning the stack.
It is possible to change the logic of the operator (all the operators) to avoid ever actually reading beyond the end of the objects, but I think this would slow things down for no benefit.
As for the test case (others will likely also have this problem), I think our best bet may be to try to prevent objects that may be over-read from being at the top of their stack frames.
I've pushed a change to the branch bugfix/sept2020
. Could you try running against that to see if you still have that problem?
If you still crash, it should hopefully happen after the test_nn_conv2d_hstrip*
cases.
Ping @andrewstanfordjason
Ping @andrewstanfordjason
@andrewstanfordjason Can I close this issue? Or, are you still observing it?
I'll run it again tomorrow and let you know
I just ran it again ==4903==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffee27d9714 at pc 0x00010d620ba3 bp 0x7ffee27d8f80 sp 0x7ffee27d8748 READ of size 32 at 0x7ffee27d9714 thread T0
#1 0x10d55b5f1 in VLDC vpu_sim.c:113
#2 0x10d54f64e in nn_conv2d_hstrip_deep_padded nn_conv2d_hstrip_deep.c:400
#3 0x10d42eaa2 in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:153
#4 0x10d56c29c in UnityDefaultTestRun unity.c:1339
#5 0x10d43c5c9 in test_nn_conv2d_hstrip_deep_padded test_nn_conv2d_hstrip_deep_padded.c:1027
#6 0x10d47e5bd in main main.c:15
#7 0x7fff6e79bcc8 in start (libdyld.dylib:x86_64+0x1acc8)
sorry I didn't run it on your branch, I'll try again
@andrewstanfordjason Any update on this?
if you just add
PLATFORM_FLAGS := $(PLATFORM_FLAGS_DEFAULT) -DTF_LITE_DISABLE_X86_NEON -fsanitize=address
to x86.mk then it'll do the check for you when you run it. I was going to PR this change alone but wanted to have it work first.
I ran it ==71801==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffeed455eb4 at pc 0x0001029a6ba3 bp 0x7ffeed455720 sp 0x7ffeed454ee8 READ of size 32 at 0x7ffeed455eb4 thread T0
#1 0x1028df5f1 in VLDC vpu_sim.c:113
#2 0x1028d364e in nn_conv2d_hstrip_deep_padded nn_conv2d_hstrip_deep.c:400
#3 0x1027b2aa2 in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:153
#4 0x1028f029c in UnityDefaultTestRun unity.c:1339
#5 0x1027c05c9 in test_nn_conv2d_hstrip_deep_padded test_nn_conv2d_hstrip_deep_padded.c:1027
#6 0x1028025bd in main main.c:15
#7 0x7fff6e79bcc8 in start (libdyld.dylib:x86_64+0x1acc8)
Address 0x7ffeed455eb4 is located in stack of thread T0 at offset 660 in frame
This frame has 7 object(s): [32, 232) 'str_buff.i' (line 38) [304, 528) 'BSO' (line 85) [592, 608) 'Y' (line 94) [624, 660) 'X' (line 95) <== Memory access at offset 660 overflows this variable [704, 1280) 'K' (line 96) [1408, 1632) 'bso' (line 97) [1696, 1728) 'zero_point_vec' (line 100) HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions are supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2) in __asan_memcpy Shadow bytes around the buggy address: 0x1fffdda8ab80: 00 00 00 00 f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 0x1fffdda8ab90: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x1fffdda8aba0: f8 f2 f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 0x1fffdda8abb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8abc0: 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 00 00 =>0x1fffdda8abd0: f2 f2 00 00 00 00[04]f2 f2 f2 f2 f2 00 00 00 00 0x1fffdda8abe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8abf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8ac00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8ac10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8ac20: 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==71801==ABORTING
still broken
@andrewstanfordjason I figured out how to run the address sanitizer so that the unit test executable will not halt on the first error. Instructions are in the comments near the end of x86.mk
. Hopefully this is a sufficient workaround for you.
One other workaround exists. That is to put the following attribute on of all the offending test_
functions (currently around a dozen).
__attribute__((no_sanitize("address")))
This attribute silences the address sanitizer for a function.
I discussed both workaround options with @astewart-xmos and we feel the "no halting" option was the best option to start with. We are open to the other but worry silencing the sanitizer is not the best first step.
Andrews-MacBook:unit_test andrew$ ./bin/x86/unit_test src/test_vpu_memcpy.c:98:test_vpu_memcpy_case0:PASS src/test_vpu_memcpy.c:99:test_vpu_memcpy_case1:PASS
==45939==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffee3e4a654 at pc 0x00010bf62ba3 bp 0x7ffee3e4a000 sp 0x7ffee3e497c8 READ of size 32 at 0x7ffee3e4a654 thread T0
0 0x10bf62ba2 in __asan_memcpy (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2)
Address 0x7ffee3e4a654 is located in stack of thread T0 at offset 340 in frame
0 0x10bdbc53f in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:82
This frame has 7 object(s): [32, 232) 'str_buff.i' (line 38) [304, 340) 'X' (line 85) <== Memory access at offset 340 overflows this variable [384, 960) 'K' (line 87) [1088, 1312) 'BSO' (line 89) [1376, 1600) 'bso' (line 98) [1664, 1680) 'Y' (line 100) [1696, 1728) 'zero_point_vec' (line 103) HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions are supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2) in __asan_memcpy Shadow bytes around the buggy address: 0x1fffdc7c9470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c94a0: f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x1fffdc7c94b0: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 =>0x1fffdc7c94c0: f2 f2 f2 f2 f2 f2 00 00 00 00[04]f2 f2 f2 f2 f2 0x1fffdc7c94d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c94e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c94f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9510: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==45939==ABORTING Abort trap: 6