lamikr / rocm_sdk_builder

Other
113 stars 8 forks source link

Build failed in onnxruntime on ubuntu 22.04 #46

Closed meso-uca closed 3 weeks ago

meso-uca commented 3 weeks ago

build env : ubuntu 22.04 + cmake 3.29.3 (distro cmake version was not enough for build system)

Process :

git clone https://github.com/lamikr/rocm_sdk_builder.git
cd rocm_sdk_builder
git checkout releases/rocm_sdk_builder_611
./babs.sh -i
# selected gfx906;gfx90a;gfx940;gfx1102
./babs.sh -co
./babs.sh -ap
./babs.sh -b

Error on building onnxruntime :

/home/ubuntu/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training
Building onnxruntime
[0] onnxruntime, build command:
cd /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime
[1] onnxruntime, build command:
./build_onnxruntime_rocm_training.sh /opt/rocm_sdk_611 "gfx906;gfx90a;gfx940;gfx1102"
using rocm_root_directory specified: /opt/rocm_sdk_611
Using specified amd rocm gpu: "gfx906;gfx90a;gfx940;gfx1102"
2024-06-01 21:21:14,937 tools_python_utils [INFO] - flatbuffers module is not installed. parse_config will not be available
2024-06-01 21:21:16,663 build [DEBUG] - Command line arguments:
  --build_dir /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux --config Release --enable_training --build_wheel --parallel --skip_tests --build_shared_lib --use_rocm --rocm_home /opt/rocm_sdk_611 --use_migraphx --migraphx_home /opt/rocm_sdk_611 --cmake_extra_defines CMAKE_HIP_COMPILER=/opt/rocm_sdk_611/bin/clang++ CMAKE_INSTALL_PREFIX=/opt/rocm_sdk_611 'CMAKE_HIP_ARCHITECTURES="gfx906;gfx90a;gfx940;gfx1102"'
Namespace(build_dir='/home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux', config=['Release'], update=False, build=False, clean=False, parallel=0, nvcc_threads=-1, test=False, skip_tests=True, compile_no_warning_as_error=False, enable_nvtx_profile=False, enable_memory_profile=False, enable_training=True, enable_training_apis=False, enable_training_ops=False, enable_nccl=False, mpi_home=None, nccl_home=None, use_mpi=False, enable_onnx_tests=False, path_to_protoc_exe=None, fuzz_testing=False, enable_symbolic_shape_infer_tests=False, gen_doc=None, gen_api_doc=False, use_cuda=False, cuda_version=None, cuda_home=None, cudnn_home=None, enable_cuda_line_info=False, enable_cuda_nhwc_ops=False, enable_pybind=False, build_wheel=True, wheel_name_suffix=None, numpy_version=None, skip_keras_test=False,build_csharp=False, build_nuget=False, msbuild_extra_options=None, build_java=False, build_nodejs=False, build_objc=False, build_shared_lib=True, build_apple_framework=False, cmake_extra_defines=[['CMAKE_HIP_COMPILER=/opt/rocm_sdk_611/bin/clang++', 'CMAKE_INSTALL_PREFIX=/opt/rocm_sdk_611', 'CMAKE_HIP_ARCHITECTURES="gfx906;gfx90a;gfx940;gfx1102"']], target=None, x86=False, arm=False, arm64=False, arm64ec=False, buildasx=False, msvc_toolset=None, windows_sdk_version=None, android=False, android_abi='arm64-v8a', android_api=27, android_sdk_path='', android_ndk_path='', android_cpp_shared=False, android_run_emulator=False, use_gdk=False, gdk_edition='.', gdk_platform='Scarlett', ios=False, apple_sysroot='', ios_toolchain_file='', xcode_code_signing_team_id='', xcode_code_signing_identity='', cmake_generator=None, osx_arch='x86_64', apple_deploy_target=None, enable_address_sanitizer=False, enable_qspectre=False, disable_memleak_checker=False, build_wasm=False, build_wasm_static_lib=False, emsdk_version='3.1.51', enable_wasm_simd=False, enable_wasm_threads=False, disable_wasm_exception_catching=False, enable_wasm_api_exception_catching=False, enable_wasm_exception_throwing_override=True, wasm_run_tests_in_browser=False, enable_wasm_profiling=False, enable_wasm_debug_info=False, wasm_malloc=None, emscripten_settings=None, use_extensions=False, extensions_overridden_path=None, cmake_path='cmake', ctest_path='ctest', skip_submodule_sync=False, use_mimalloc=False, use_dnnl=False, dnnl_gpu_runtime='', dnnl_opencl_root='', use_openvino=None, dnnl_aarch64_runtime='', dnnl_acl_root='', use_coreml=False, use_webnn=False, use_snpe=False, snpe_root=None, use_nnapi=False, nnapi_min_api=None, use_jsep=False, use_qnn=False, qnn_home=None, use_rknpu=False, use_preinstalled_eigen=False, eigen_path=None, enable_msinternal=False, llvm_path=None, use_vitisai=False, use_tvm=False, tvm_cuda_runtime=False, use_tvm_hash=False, use_tensorrt=False, use_tensorrt_builtin_parser=True, use_tensorrt_oss_parser=False, tensorrt_home=None, test_all_timeout='10800', use_migraphx=True, migraphx_home='/opt/rocm_sdk_611', use_full_protobuf=False, llvm_config='', skip_onnx_tests=False, skip_winml_tests=False, skip_nodejs_tests=False, enable_msvc_static_runtime=False, enable_language_interop_ops=False, use_dml=False, dml_path='', use_winml=False, winml_root_namespace_override=None, dml_external_project=False, use_telemetry=False, enable_wcos=False,enable_lto=False, enable_transformers_tool_test=False, use_acl=None, acl_home=None, acl_libs=None, use_armnn=False, armnn_relu=False, armnn_bn=False, armnn_home=None, armnn_libs=None, build_micro_benchmarks=False, minimal_build=None, include_ops_by_config=None, enable_reduced_operator_type_support=False, disable_contrib_ops=False, disable_ml_ops=False, disable_rtti=False, disable_types=[], disable_exceptions=False, rocm_version=None, use_rocm=True, rocm_home='/opt/rocm_sdk_611', code_coverage=False, enable_lazy_tensor=False, ms_experimental=False, enable_external_custom_op_schemas=False, external_graph_transformer_path=None, enable_cuda_profiling=False, use_cann=False, cann_home=None, enable_rocm_profiling=False, use_xnnpack=False, use_azure=False, use_cache=False, use_triton_kernel=False, use_lock_free_queue=False, allow_running_as_root=False)
2024-06-01 21:21:16,670 build [DEBUG] - Defaulting to running update, build [and test for native builds].
migraphx_home = /opt/rocm_sdk_611
rocm_home = /opt/rocm_sdk_611
2024-06-01 21:21:16,670 build [INFO] - Build started

[...]

-- Found pybind11: /opt/rocm_sdk_611/include (found version "")
-- Configuring done (6.2s)
-- Generating done (1.3s)
-- Build files have been written to: /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release
2024-06-01 21:21:25,634 build [INFO] - Building targets for Release configuration
2024-06-01 21:21:25,636 build [INFO] - /usr/bin/cmake --build /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release --config Release -- -j16
[  2%] Building HIP object _deps/composable_kernel-build/library/src/tensor_operation_instance/gpu/gemm_streamk/CMakeFiles/device_gemm_streamk_instance.dir/device_gemm_xdl_streamk_f16_f16_f16_mk_kn_mn_instance.cpp.o

clang++: error: invalid target ID 'gfx906 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx1102'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')

gmake[1]: *** [CMakeFiles/Makefile2:17271: _deps/composable_kernel-build/library/src/tensor_operation_instance/gpu/gemm_streamk/CMakeFiles/device_gemm_streamk_instance.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
[...]
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release', '--config', 'Release', '--', '-j16']' returned non-zero exit status 2.
build failed: onnxruntime
  error in build cmd: ./build_onnxruntime_rocm_training.sh /opt/rocm_sdk_611 "gfx906;gfx90a;gfx940;gfx1102"

build failed

The generated clang options seem strange :

gfx906 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx1102

Any idea ?

Stefan-Olt commented 3 weeks ago

I was able to build onnxruntime (with updated cmake) on 22.04, so it's probably related to the device selection and not to the OS

lamikr commented 3 weeks ago

Just testing a build with Linux Mint 21.03 which is based on to ubuntu 22.04 and at least there the build system cmake has been new enough. But we have done couple of dependency updates... I will put more info after couple of hours.

Stefan-Olt commented 3 weeks ago

I'm running Mint 21.3, it gets cmake from Ubuntu and it's version is 3.25.1, I'm not sure what version onnxruntime required, I think it was 3.26 or 3.28. What version of cmake do you have? (/usr/bin/cmake --version for the distro version, cmake --version for the active one). Maybe some package installed cmake as dependency with pip

lamikr commented 3 weeks ago

I pushed small change to install_debs.sh script. If it detects that the the distro is ubuntu 22.04 version, it will install in addition of apps installed also for never versions, also do the:

sudo apt install libstdc++-12-dev libgfortran-12-dev gfortran-12

lamikr commented 3 weeks ago

Onnxruntime build problem is still here, I also got the error on Linux Mint about too old cmake.

CMake Error at CMakeLists.txt:5 (cmake_minimum_required):
  CMake 3.26 or higher is required.  You are running version 3.22.1

Does ubuntu offer any official backport build of cmake that could be installed?

lamikr commented 3 weeks ago

I believe I have fix coming for this, just testing.

lamikr commented 3 weeks ago

This should have now been fixed. Tested on Linux Mint but can you verify it by running these commands so that you get changes in without need to rebuild other packages.

rm -rf src_projects/onnxruntime/build
git pull
./babs.sh -i
./babs.sh -co
./babs.sh -ap
./babs.sh -b

Basically it will now build the cmake 3.26.6 under /opt/rocm_sdk_611/cmake directory and use it when building onnxruntime if it detects that the version provided by the linux distribution is too old.

lamikr commented 3 weeks ago

Ok to close this now?

meso-uca commented 3 weeks ago

Now, with the current version (git pull && checkout && apply patches), error when building GraphBLAS :

[ 89%] Building C object GraphBLAS/CMakeFiles/GraphBLAS.dir/Source/GrB_IndexUnaryOp_free.c.o
cd /home/ubuntu/rocm_sdk_builder/builddir/023_04_SuiteSparse/GraphBLAS && /opt/rocm_sdk_611/bin/clang -DGraphBLAS_EXPORTS -DHAVE_DLFCN_H -DHAVE_STRONG_GETAUXVAL -I/home/ubuntu/rocm_sdk_builder/builddir/023_04_SuiteSparse/GraphBLAS -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/cpu_features/include -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/cpu_features -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/cpu_features/src -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/cpu_features/include/internal -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/Template -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Include -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/Shared -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Config -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/xxHash -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/lz4 -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/zstd -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/JITpackage -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/FactoryKernels -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/Factories -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Demo/Include -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/rmm_wrap -I/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/JitKernels -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wno-pointer-sign  -O3 -DNDEBUG -Wno-extra-semi-stmt -Wno-extra-semi-stmt -O3 -DNDEBUG -Wno-extra-semi-stmt -Wno-extra-semi-stmt -std=gnu11 -fPIC -fopenmp=libomp -MD -MT GraphBLAS/CMakeFiles/GraphBLAS.dir/Source/GrB_IndexUnaryOp_free.c.o -MF CMakeFiles/GraphBLAS.dir/Source/GrB_IndexUnaryOp_free.c.o.d -o CMakeFiles/GraphBLAS.dir/Source/GrB_IndexUnaryOp_free.c.o -c /home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/GrB_IndexUnaryOp_free.c
In file included from /home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:573:10: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
  573 |     case ZSTD_c_targetCBlockSize:
      |          ^
/home/ubuntu/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/common/../common/../zstd.h:1952:33: note: expanded from macro 'ZSTD_c_targetCBlockSize'
 1952 | #define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
      |                                 ^

I'm using the ubuntu 22.04 LTS cloud image : https://cloud-images.ubuntu.com/jammy/current/

lamikr commented 3 weeks ago

I applied a lot of fixes in pull request https://github.com/lamikr/rocm_sdk_builder/pull/59 and at least on Ubuntu 22.04 based Linux Mint 21 the clean build worked without issues. Can you re-open if needed.

flip111 commented 1 week ago

I'm also getting these errors with gfx1100

commit c3a13864f4b762e1d7dfb202cb38630228f80166

In file included from /home/flip111/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/home/flip111/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:943:24: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
  943 |             BOUNDCHECK(ZSTD_c_targetCBlockSize, value);
      |                        ^
In file included from /home/flip111/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/home/flip111/rocm_sdk_builder/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:1114:10: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
 1114 |     case ZSTD_c_targetCBlockSize :
      |          ^
flip111 commented 6 days ago

@lamikr any chance we can reopen this? Or do you prefer a new issue?