iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.6k stars 583 forks source link

Fix unaligned memory accesses #9922

Open kuhar opened 2 years ago

kuhar commented 2 years ago

What happened?

When e2e tests are executed on Android, the Undefined Behavior Sanitizer complains about multiple unaligned memory accesses.
Setting the IREE_MEMORY_ACCESS_ALIGNMENT_REQUIRED=1 cmake flag does not fix the ubsan alarms, as I'd expect. We need to either wrap such access with iree_unalidnged_* or fix the underlying issue that causes pointer to be misaligned.

Accessing unaligned memory is prohibited by C/C++, regardless of whether it is known to work in practice or not. It'd be better to not violate any alignment assumptions so that we can run CI with ubsan checks enabled at all times.

Steps to reproduce your issue

Install Android NDK r24 and set ANDROID_NDK.

CMake script:

#! /bin/sh

set -eou pipefail
readonly IREE_DIR="$(realpath "$1")"
readonly BUILD_DIR="$(realpath "$2")"

mkdir -p "$BUILD_DIR"
cd "$BUILD_DIR"

cmake "$IREE_DIR" -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo \
                        -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK?}/build/cmake/android.toolchain.cmake" \
            -DIREE_HOST_BINARY_ROOT="$(realpath "$BUILD_DIR/../relass/run")" \
            -DANDROID_ABI="arm64-v8a" \
                        -DANDROID_PLATFORM="android-29" \
            -DIREE_BUILD_COMPILER=0 \
            -DIREE_ENABLE_ASSERTIONS=1 \
            -DIREE_ENABLE_ASAN=1 \
            -DIREE_ENABLE_UBSAN=1 \
            -DIREE_MEMORY_ACCESS_ALIGNMENT_REQUIRED=1 \
            -DIREE_BUILD_TESTS=1 \
                        -DIREE_BUILD_SAMPLES=0 \
                -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
            -DCMAKE_INSTALL_PREFIX=run

Build with ninja all iree-test-deps

Test command:

ctest --output-on-failure -R '.*e2e_matmul_mmt4d_i8_small_llvm-cpu_local-task_on_android_device.*' -V | ../iree/build_tools/scripts/android_symbolize.sh

(I'll open a PR adding the IREE_ENABLE_UBSAN option shortly)

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

Sample ubsan report attached: ubsan.log.

kuhar commented 2 years ago

cc: @benvanik

kuhar commented 2 years ago

@benvanik I discovered that I made a mistake in my android repro. Adding IREE_MEMORY_ACCESS_ALIGNMENT_REQUIRED with add_compile_definitions in iree/runtime/src/iree/base/CMakeLists.txt was insufficient and the option did not propagate to bytecode_dispatch.c. After I moved it to the top-level cmake file, bytecode_dispatch unaligned accesses are gone but some other ones remain and are reproducible on linux.

Linux repro:

Cmake configuration script:

#! /usr/bin/env bash

set -euo pipefail
readonly IREE_DIR="$(realpath "$1")"
readonly BUILD_DIR="$(realpath "$2")"

mkdir -p "$BUILD_DIR"
cd "$BUILD_DIR"

cmake "$IREE_DIR" -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo \
            -DIREE_ENABLE_ASSERTIONS=1 \
            -DIREE_ENABLE_UBSAN=1 \
                        -DCMAKE_C_FLAGS="-DIREE_MEMORY_ACCESS_ALIGNMENT_REQUIRED=1" \
                        -DCMAKE_CXX_FLAGS="-DIREE_MEMORY_ACCESS_ALIGNMENT_REQUIRED=1" \
            -DCMAKE_CXX_COMPILER=clang++-13 -DCMAKE_C_COMPILER=clang-13

Build and test:

./config_ubsan.sh iree ubsan
cd ubsan
ninja all iree-test-deps
ctest --output-on-failure -R '.*e2e_matmul_mmt4d_f32_intrinsics_small.*' -V

UBSan errors:

24: Test timeout computed to be: 0
24: /iree/iree/runtime/src/iree/base/string_view.c:275:18: runtime error: null pointer passed as argument 2, which is declared to never be null
24: /usr/include/string.h:44:28: note: nonnull attribute specified here
24: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /iree/iree/runtime/src/iree/base/string_view.c:275:18 in 
24: /iree/iree/runtime/src/iree/vm/ref.c:236:23: runtime error: member access within misaligned address 0x7fff5afa9844 for type 'iree_vm_ref_t' (aka 'struct iree_vm_ref_t'), which requires 8 byte alignment
24: 0x7fff5afa9844: note: pointer points here
24:   2d 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
24:               ^ 
24: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /iree/iree/runtime/src/iree/vm/ref.c:236:23 in 
24: /iree/iree/runtime/src/iree/vm/ref.c:236:23: runtime error: load of misaligned address 0x7fff5afa9844 for type 'void *', which requires 8 byte alignment
24: 0x7fff5afa9844: note: pointer points here
24:   2d 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
24:               ^ 
24: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /iree/iree/runtime/src/iree/vm/ref.c:236:23 in 
24: --- CALL[module.matmul_DYNxDYNxf32_times_DYNxDYNxf32_into_DYNxDYNxf32] ---
24: Matmul shape (MxKxN): 1x1x1
24: /iree/iree/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c:941:5: runtime error: store to misaligned address 0x000002072f44 for type 'void *', which requires 8 byte alignment
24: 0x000002072f44: note: pointer points here
24:   02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
24:               ^ 
24: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /iree/iree/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c:941:5 in 
24: /iree/iree/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c:942:5: runtime error: store to misaligned address 0x000002072f54 for type 'size_t' (aka 'unsigned long'), which requires 8 byte alignment
24: 0x000002072f54: note: pointer points here
24:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
24:               ^ 
24: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /iree/iree/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c:942:5 in 
24: /iree/iree/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c:943:10: runtime error: load of misaligned address 0x000002072f44 for type 'void *', which requires 8 byte alignment
24: 0x000002072f44: note: pointer points here
24:   02 00 00 00 00 90 06 02  00 00 00 00 00 00 00 00  00 00 00 00 04 00 00 00  00 00 00 00 00 00 00 00
24:               ^ 
24: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /iree/iree/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c:943:10 in 
24: 
allieculp commented 2 years ago

@benvanik Can you take a look at this? Any idea of priority?

allieculp commented 2 years ago

Hey @benvanik any update on this one?

allieculp commented 2 years ago

Hey @benvanik any update on this one?