xmos / ai_tools

AI applications and tools
Other
25 stars 10 forks source link

lib_nn crashed with address san turned on #194

Closed andrewstanfordjason closed 3 years ago

andrewstanfordjason commented 4 years ago

Andrews-MacBook:unit_test andrew$ ./bin/x86/unit_test src/test_vpu_memcpy.c:98:test_vpu_memcpy_case0:PASS src/test_vpu_memcpy.c:99:test_vpu_memcpy_case1:PASS

==45939==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffee3e4a654 at pc 0x00010bf62ba3 bp 0x7ffee3e4a000 sp 0x7ffee3e497c8 READ of size 32 at 0x7ffee3e4a654 thread T0

0 0x10bf62ba2 in __asan_memcpy (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2)

#1 0x10beb0251 in VLDC vpu_sim.c:113
#2 0x10bea42ae in nn_conv2d_hstrip_deep_padded nn_conv2d_hstrip_deep.c:400
#3 0x10bdbda09 in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:156
#4 0x10bec0efc in UnityDefaultTestRun unity.c:1339
#5 0x10bdcb739 in test_nn_conv2d_hstrip_deep_padded test_nn_conv2d_hstrip_deep_padded.c:1046
#6 0x10be08c7d in main main.c:15
#7 0x7fff6d753cc8 in start (libdyld.dylib:x86_64+0x1acc8)

Address 0x7ffee3e4a654 is located in stack of thread T0 at offset 340 in frame

0 0x10bdbc53f in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:82

This frame has 7 object(s): [32, 232) 'str_buff.i' (line 38) [304, 340) 'X' (line 85) <== Memory access at offset 340 overflows this variable [384, 960) 'K' (line 87) [1088, 1312) 'BSO' (line 89) [1376, 1600) 'bso' (line 98) [1664, 1680) 'Y' (line 100) [1696, 1728) 'zero_point_vec' (line 103) HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions are supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2) in __asan_memcpy Shadow bytes around the buggy address: 0x1fffdc7c9470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c94a0: f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x1fffdc7c94b0: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 =>0x1fffdc7c94c0: f2 f2 f2 f2 f2 f2 00 00 00 00[04]f2 f2 f2 f2 f2 0x1fffdc7c94d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c94e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c94f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdc7c9510: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==45939==ABORTING Abort trap: 6

astewart-xmos commented 4 years ago

I think I know what's going on, but I don't know whether we should do anything about it.

So, I tried to reproduce this but ultimately failed because apparently address sanitization is not supported on Windows.

From the information provided above, though, the culprit seems to be at line 400 here:

https://github.com/xmos/ai_tools/blob/b282f678db9a955bf6ba4c043b2fac26240b8e0e/lib_nn/lib_nn/src/c/util/deep/nn_conv2d_hstrip_deep.c#L395-L403

The context of this call is in test_nn_conv2d_hstrip_deep_padded_case0(). This test case checks whether the nn_conv2d_hstrip_deep_padded() function works correctly on a 1x1 input image with 36 channels. As it happens, the input image X is the first object declared in the test case's stack frame (line 85):

https://github.com/xmos/ai_tools/blob/b282f678db9a955bf6ba4c043b2fac26240b8e0e/lib_nn/test/unit_test/src/adv/deep/test_nn_conv2d_hstrip_deep_padded.c#L81-L87


Here's what I believe is going on.

Line 400 in nn_conv2d_hstrip_deep_padded() is loading vC with data from the input image. Because the vector registers are 32 bytes long, but the input image X in this case is 36 bytes long, the last 4 bytes of X are what I call the input channel tail. Line 400 is supposed to be loading the input channel tail, and masking off the garbage at the end. So that's what it's doing -- load the last 4 bytes plus the 28 bytes following X.

That is the intended behavior. The problem appears to be that, because X happens to be the first object declared in the stack frame, reading an extra 28 bytes beyond it actually reads beyond the end of the stack frame (and likely beyond the end of the entire stack). It seems to not like that a whole lot.


What to do?

The whole reading-beyond-the-end-of-the-input-image thing is entirely intentional, because we can't load partial vectors into the VPU registers. If the object we're reading from happens to exist at a location in memory that would cause an exception (or similar) if we attempt to read beyond it, we're going to have a problem, whether it's on a Macbook or on xcore. I don't actually expect this to be a problem in a program more complicated than a test case. What I'm uncertain of (which is why I wanted to reproduce it) is whether the crash was just because it overran the stack, or if it was because it overran that individual object. I'm assuming it was only because of overrunning the stack.

It is possible to change the logic of the operator (all the operators) to avoid ever actually reading beyond the end of the objects, but I think this would slow things down for no benefit.

As for the test case (others will likely also have this problem), I think our best bet may be to try to prevent objects that may be over-read from being at the top of their stack frames.

astewart-xmos commented 4 years ago

I've pushed a change to the branch bugfix/sept2020. Could you try running against that to see if you still have that problem?

If you still crash, it should hopefully happen after the test_nn_conv2d_hstrip* cases.

astewart-xmos commented 4 years ago

Ping @andrewstanfordjason

lkindrat-xmos commented 4 years ago

Ping @andrewstanfordjason

keithm-xmos commented 3 years ago

@andrewstanfordjason Can I close this issue? Or, are you still observing it?

andrewstanfordjason commented 3 years ago

I'll run it again tomorrow and let you know

andrewstanfordjason commented 3 years ago

I just ran it again ==4903==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffee27d9714 at pc 0x00010d620ba3 bp 0x7ffee27d8f80 sp 0x7ffee27d8748 READ of size 32 at 0x7ffee27d9714 thread T0

0 0x10d620ba2 in __asan_memcpy (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2)

#1 0x10d55b5f1 in VLDC vpu_sim.c:113
#2 0x10d54f64e in nn_conv2d_hstrip_deep_padded nn_conv2d_hstrip_deep.c:400
#3 0x10d42eaa2 in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:153
#4 0x10d56c29c in UnityDefaultTestRun unity.c:1339
#5 0x10d43c5c9 in test_nn_conv2d_hstrip_deep_padded test_nn_conv2d_hstrip_deep_padded.c:1027
#6 0x10d47e5bd in main main.c:15
#7 0x7fff6e79bcc8 in start (libdyld.dylib:x86_64+0x1acc8)
andrewstanfordjason commented 3 years ago

sorry I didn't run it on your branch, I'll try again

keithm-xmos commented 3 years ago

@andrewstanfordjason Any update on this?

andrewstanfordjason commented 3 years ago

if you just add

PLATFORM_FLAGS := $(PLATFORM_FLAGS_DEFAULT) -DTF_LITE_DISABLE_X86_NEON -fsanitize=address

to x86.mk then it'll do the check for you when you run it. I was going to PR this change alone but wanted to have it work first.

andrewstanfordjason commented 3 years ago

I ran it ==71801==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffeed455eb4 at pc 0x0001029a6ba3 bp 0x7ffeed455720 sp 0x7ffeed454ee8 READ of size 32 at 0x7ffeed455eb4 thread T0

0 0x1029a6ba2 in __asan_memcpy (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2)

#1 0x1028df5f1 in VLDC vpu_sim.c:113
#2 0x1028d364e in nn_conv2d_hstrip_deep_padded nn_conv2d_hstrip_deep.c:400
#3 0x1027b2aa2 in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:153
#4 0x1028f029c in UnityDefaultTestRun unity.c:1339
#5 0x1027c05c9 in test_nn_conv2d_hstrip_deep_padded test_nn_conv2d_hstrip_deep_padded.c:1027
#6 0x1028025bd in main main.c:15
#7 0x7fff6e79bcc8 in start (libdyld.dylib:x86_64+0x1acc8)

Address 0x7ffeed455eb4 is located in stack of thread T0 at offset 660 in frame

0 0x1027b161f in test_nn_conv2d_hstrip_deep_padded_case0 test_nn_conv2d_hstrip_deep_padded.c:82

This frame has 7 object(s): [32, 232) 'str_buff.i' (line 38) [304, 528) 'BSO' (line 85) [592, 608) 'Y' (line 94) [624, 660) 'X' (line 95) <== Memory access at offset 660 overflows this variable [704, 1280) 'K' (line 96) [1408, 1632) 'bso' (line 97) [1696, 1728) 'zero_point_vec' (line 100) HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions are supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x43ba2) in __asan_memcpy Shadow bytes around the buggy address: 0x1fffdda8ab80: 00 00 00 00 f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 0x1fffdda8ab90: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x1fffdda8aba0: f8 f2 f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 0x1fffdda8abb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8abc0: 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 00 00 =>0x1fffdda8abd0: f2 f2 00 00 00 00[04]f2 f2 f2 f2 f2 00 00 00 00 0x1fffdda8abe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8abf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8ac00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8ac10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffdda8ac20: 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==71801==ABORTING

still broken

keithm-xmos commented 3 years ago

@andrewstanfordjason I figured out how to run the address sanitizer so that the unit test executable will not halt on the first error. Instructions are in the comments near the end of x86.mk. Hopefully this is a sufficient workaround for you.

One other workaround exists. That is to put the following attribute on of all the offending test_ functions (currently around a dozen).

__attribute__((no_sanitize("address")))

This attribute silences the address sanitizer for a function.

I discussed both workaround options with @astewart-xmos and we feel the "no halting" option was the best option to start with. We are open to the other but worry silencing the sanitizer is not the best first step.