microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.6k stars 2.92k forks source link

[Build] compilation error: invalid instruction mnemonic 'vcvtneeph2ps' #22519

Open saiden89 opened 1 week ago

saiden89 commented 1 week ago

Describe the issue

I am attempting to compile ONNX Runtime on the LUMI supercomputer, a Cray system.

The configuration step is completed without any issues. However, during the compile phase, I encountered problems when using the default CC and cc Cray compiler wrappers, which apply Cray-specific optimizations. To bypass this, I manually specified the AMD compilers (amdclang and amdclang++) instead of the wrappers.

System Details:

Now, I’m encountering a compile-time error possibly related to the AVX512 instruction set: error: invalid instruction mnemonic 'vcvtneeph2ps', but I’m not familiar enough with all this to diagnose the issue. I would appreciate any guidance on how to address this.

Urgency

Not urgent, but would be nice to have since I have a big inference job on a project.

Target platform

AMD MI250X

Build script

The build script relies on some specific modules being loaded to target the correct architecture, as well as loading the correct programming environment. Full reproducibility might be limited because of the exotic nature of the system, but I am more than happy to try myself any suggestions.

module purge

module load PrgEnv-amd
module load rocm/6.0.3
module load craype-accel-amd-gfx90a craype-x86-trento

cd /tmp || exit
git clone --single-branch --branch main --recursive https://github.com/Microsoft/onnxruntime onnxruntime
cd onnxruntime || exit

mamba install rust -y
pip install cmake

./build.sh --config Release \
    --build_wheel \
    --update \
    --build \
    --parallel \
    --use_rocm \
    --rocm_home "$ROCM_PATH" \
    --cmake_extra_defines CMAKE_HIP_ARCHITECTURES=gfx90a \
    --cmake_extra_defines CMAKE_C_COMPILER=amdclang \
    --cmake_extra_defines CMAKE_CXX_COMPILER=amdclang++

pip install build/Linux/Release/dist/*

Error / output

log.txt

Visual Studio Version

No response

GCC / Compiler Version

AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.0.3 24012 af27734ed982b52a9f1be0f035ac91726fc697e4)

edgchen1 commented 1 week ago

Here's the error from the log for convenience:

Building ASM object CMakeFiles/onnxruntime_mlas.dir/tmp/onnxruntime/onnxruntime/core/mlas/lib/x86_64/cvtfp16Avx.S.o
/tmp/onnxruntime/onnxruntime/core/mlas/lib/x86_64/cvtfp16Avx.S:60:9: error: invalid instruction mnemonic 'vcvtneeph2ps'
        vcvtneeph2ps ymm0, ymmword PTR [rdi]
        ^~~~~~~~~~~~

I think this code was added in this PR: https://github.com/microsoft/onnxruntime/pull/21183

@eralmual do you have any pointers on how to fix this?

eralmual commented 1 week ago

Hi! Thank you for reaching out!

Seems like the vcvtneeph2ps instruction not recognized by the compiler, I did a quick search and the instruction is supported on Clang since v16.0 as part of the AVX-NE-CONVERT ISA, seems like you are using v17.0 so it should work fine.

If it's not working for Clang in general I can do a quick patch to prevent the compiler error while we find a solution, just let me know. In the meanwhile i think you should be able to safely delete the if and everything inside at line https://github.com/microsoft/onnxruntime/blob/c7138a2630b01e30340a52959c232305394fd86f/cmake/onnxruntime_mlas.cmake#L574 and that should fix the compiler issue.

Let me know if it works!

snnn commented 1 week ago

It is more about if your Assembler(like gas) can recognize this instruction. We should write a test program to check it: https://cmake.org/cmake/help/latest/module/CheckSourceCompiles.html, instead of detecting compiler name/version.

Contributions are welcomed

saiden89 commented 1 week ago

Thank you @eralmual for the suggestion, your proposed solution solves the problem. However, as the compilation continues I am greeted by a lot more errors.

/pfs/lustrep2/projappl/project_465000941/compartments/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/cast_op.cc:295:1: error: explicit instantiation of 'ComputeInternal' that occurs after an explicit specialization has no effect [-Werror,-Winstantiation-after-specialization]
SPECIALIZE_IMPL(MLFloat16)

/pfs/lustrep2/projappl/project_465000941/compartments/onnxruntime/onnxruntime/core/providers/rocm/nn/conv_impl.cu:24:21: error: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
  fast_divmod fdm_c(bias_size);
              ~~~~~ ^~~~~~~~~

Any further insights are deeply appreciated, thanks!

snnn commented 1 week ago

Please add "--compile_no_warning_as_error" to your build command.

snnn commented 1 week ago

We don't use clang to build our CUDA code. Therefore we didn't see such warnings. You can help us fix them or suppress them if you'd like. Contributions are welcome. Thanks.