arm64 build failed with ndk r21b

zchrissirhcz commented 2 years ago

Error message

/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:65:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tcurr, tnext[0], nc);
                          ^                          ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:327:13: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRow<ppl::cv::aarch64::DilateVecOp, float, 4, 3>' requested here
            MorphRow<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
            ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:150:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tnext[0], t_last, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:330:9: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRowLast<ppl::cv::aarch64::DilateVecOp, float, 4, 3>' requested here
        MorphRowLast<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
        ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:162:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(t_last, v_border, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:65:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tcurr, tnext[0], nc);
                          ^                          ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:327:13: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRow<ppl::cv::aarch64::DilateVecOp, float, 4, 5>' requested here
            MorphRow<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
            ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:150:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tnext[0], t_last, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:330:9: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRowLast<ppl::cv::aarch64::DilateVecOp, float, 4, 5>' requested here
        MorphRowLast<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
        ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:162:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(t_last, v_border, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:65:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tcurr, tnext[0], nc);
                          ^                          ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:327:13: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRow<ppl::cv::aarch64::ErodeVecOp, float, 4, 3>' requested here
            MorphRow<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
            ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:150:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tnext[0], t_last, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:330:9: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRowLast<ppl::cv::aarch64::ErodeVecOp, float, 4, 3>' requested here
        MorphRowLast<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
        ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:162:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(t_last, v_border, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:65:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tcurr, tnext[0], nc);
                          ^                          ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:327:13: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRow<ppl::cv::aarch64::ErodeVecOp, float, 4, 5>' requested here
            MorphRow<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
            ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:150:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(tnext[0], t_last, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:330:9: note: in instantiation of function template specialization 'ppl::cv::aarch64::MorphRowLast<ppl::cv::aarch64::ErodeVecOp, float, 4, 5>' requested here
        MorphRowLast<morphOp, float, nc, kernel_len>(tprev, tcurr, tnext, srow + x, srcStride, drow, y, height - 1 - y, x - v_elem, width * nc - 1 - (x - v_elem), borderValue);
        ^
/home/zz/work/github/ppl.cv/src/ppl/cv/aarch64/morph_f32.cpp:162:27: error: argument value 4 is outside the valid range [0, 3]
                t_right = vextq_f32(t_last, v_border, nc);
                          ^                           ~~
/home/zz/soft/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:5846:25: note: expanded from macro 'vextq_f32'
  __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
                        ^                                                        ~~~~
12 errors generated.
make[2]: *** [CMakeFiles/pplcv_static.dir/build.make:272：CMakeFiles/pplcv_static.dir/src/ppl/cv/aarch64/morph_f32.cpp.o] 错误 1
make[1]: *** [CMakeFiles/Makefile2:2269：CMakeFiles/pplcv_static.dir/all] 错误 2
make: *** [Makefile:156：all] 错误 2

Reproduce

Create build/android-arm64-build.sh script with contents:

#!/bin/bash

ANDROID_NDK=~/soft/android-ndk-r21b
TOOLCHAIN=$ANDROID_NDK/build/cmake/android.toolchain.cmake

BUILD_DIR=android-arm64
mkdir -p $BUILD_DIR
cd $BUILD_DIR

#-G Ninja # fail
cmake \
    -DCMAKE_TOOLCHAIN_FILE=$TOOLCHAIN \
    -DANDROID_LD=lld \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_PLATFORM=android-24 \
    -DCMAKE_BUILD_TYPE=Release \
    -DHPCC_USE_AARCH64=ON \
    ../..

#ninja
#cmake --build . --verbose
cmake --build .

cd ..

zchrissirhcz commented 2 years ago

I guess this is due to invalid test

zchrissirhcz commented 2 years ago

Line 106 and line 110 is commented by me. They were using nc=4 for test, but compile time the compiler complains it is out of range [0,3].

openppl-public commented 2 years ago

ndk is not officially supported currently.

zchrissirhcz commented 2 years ago

ndk is not officially supported currently.

OK, this issue should be described as "aarch64/arm compilation failed due to some invalid test case arguments".

Just searched the ARM official manual about vextq_f32， without specific ISA specified, the 3rd parameter as the document described, should in range [0, 3]:

https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vextq_f32

zchrissirhcz commented 2 years ago

Seems like an ndk compiler bug. https://github.com/android/ndk/issues/1657

zchrissirhcz commented 2 years ago

Your template specialization with <4> leads to this code emitted:

t_right = vextq_f32(tcurr, tnext, 4);

which is illegal (last f32 index in a 128-bit vector is 3) and leads to the above-mentioned error. The index range verification happens before the dead code elimination is kicked in (later on in opt stage).

Based on the above, looks like intended behaviour to me.

Originally posted by @Over17 in https://github.com/android/ndk/issues/1657#issuecomment-1027948784

aboluock commented 2 years ago

hi, I fixed this bug in the newest PR, you can try this commit on your machine.

zchrissirhcz commented 2 years ago

hi, I fixed this bug in the newest PR, you can try this commit on your machine.

I think nc == 2 is missing. Though it is a rare case, when people first look at the function head (i.e. the declaration), will guess maybe nc == 2 is also supported.

I would suggest writing a small template function as thin wrapper of vextq_f32:

template<int nc>
vextq_f32(...)
{
    ...
}

aboluock commented 2 years ago

In the header file, we clearly stated that we only support 1, 3, and 4 channels now. As for nc=2, PR is welcome～

openppl-public / ppl.cv

arm64 build failed with ndk r21b #45

Error message

Reproduce