Open saiden89 opened 1 week ago
Here's the error from the log for convenience:
Building ASM object CMakeFiles/onnxruntime_mlas.dir/tmp/onnxruntime/onnxruntime/core/mlas/lib/x86_64/cvtfp16Avx.S.o
/tmp/onnxruntime/onnxruntime/core/mlas/lib/x86_64/cvtfp16Avx.S:60:9: error: invalid instruction mnemonic 'vcvtneeph2ps'
vcvtneeph2ps ymm0, ymmword PTR [rdi]
^~~~~~~~~~~~
I think this code was added in this PR: https://github.com/microsoft/onnxruntime/pull/21183
@eralmual do you have any pointers on how to fix this?
Hi! Thank you for reaching out!
Seems like the vcvtneeph2ps instruction not recognized by the compiler, I did a quick search and the instruction is supported on Clang since v16.0 as part of the AVX-NE-CONVERT ISA, seems like you are using v17.0 so it should work fine.
If it's not working for Clang in general I can do a quick patch to prevent the compiler error while we find a solution, just let me know. In the meanwhile i think you should be able to safely delete the if and everything inside at line https://github.com/microsoft/onnxruntime/blob/c7138a2630b01e30340a52959c232305394fd86f/cmake/onnxruntime_mlas.cmake#L574 and that should fix the compiler issue.
Let me know if it works!
It is more about if your Assembler(like gas) can recognize this instruction. We should write a test program to check it: https://cmake.org/cmake/help/latest/module/CheckSourceCompiles.html, instead of detecting compiler name/version.
Contributions are welcomed
Thank you @eralmual for the suggestion, your proposed solution solves the problem. However, as the compilation continues I am greeted by a lot more errors.
/pfs/lustrep2/projappl/project_465000941/compartments/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/cast_op.cc:295:1: error: explicit instantiation of 'ComputeInternal' that occurs after an explicit specialization has no effect [-Werror,-Winstantiation-after-specialization]
SPECIALIZE_IMPL(MLFloat16)
/pfs/lustrep2/projappl/project_465000941/compartments/onnxruntime/onnxruntime/core/providers/rocm/nn/conv_impl.cu:24:21: error: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
fast_divmod fdm_c(bias_size);
~~~~~ ^~~~~~~~~
Any further insights are deeply appreciated, thanks!
Please add "--compile_no_warning_as_error" to your build command.
We don't use clang to build our CUDA code. Therefore we didn't see such warnings. You can help us fix them or suppress them if you'd like. Contributions are welcome. Thanks.
Describe the issue
I am attempting to compile ONNX Runtime on the LUMI supercomputer, a Cray system.
The configuration step is completed without any issues. However, during the compile phase, I encountered problems when using the default
CC
andcc
Cray compiler wrappers, which apply Cray-specific optimizations. To bypass this, I manually specified the AMD compilers (amdclang
andamdclang++
) instead of the wrappers.System Details:
gfx90a
)Now, I’m encountering a compile-time error possibly related to the AVX512 instruction set:
error: invalid instruction mnemonic 'vcvtneeph2ps'
, but I’m not familiar enough with all this to diagnose the issue. I would appreciate any guidance on how to address this.Urgency
Not urgent, but would be nice to have since I have a big inference job on a project.
Target platform
AMD MI250X
Build script
The build script relies on some specific modules being loaded to target the correct architecture, as well as loading the correct programming environment. Full reproducibility might be limited because of the exotic nature of the system, but I am more than happy to try myself any suggestions.
Error / output
log.txt
Visual Studio Version
No response
GCC / Compiler Version
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.0.3 24012 af27734ed982b52a9f1be0f035ac91726fc697e4)