microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.42k stars 2.75k forks source link

Issues building with both --use_rocm and --build_shared_lib specified #10466

Open turmoni opened 2 years ago

turmoni commented 2 years ago

Describe the bug Attempting to build onnxruntime on Linux with both --use_rocm and --build_shared_lib leads to several errors that block compilation. This is both with the v1.10.0 and master branches, but I'm only reporting issues that are still present in master.

Before I go any further, I would like to make it clear that I'm not a direct user of onnxruntime, nor do I understand anything about CMake or the build process - the changes I've made seem to work for me, but I'm definitely not saying they're correct and free from downstream issues, or that some of the later build issues aren't caused by changes I've made earlier on, and I'm not 100% sure I've ended up with a functional ROCm version. For all I know, building this way might not be a sensible thing to do. I also see that there's a MIGraphX option and I don't know what the distinction is between that and just using ROCm, so if one supersedes the other this may not be worth anyone caring about beyond making that clear.

The command line I'm using is:

./build.sh --use_rocm --rocm_home=/opt/rocm --update --build --config=Release --build_shared_lib

(As an aside, the first issue I ran into was not realising that --rocm_home is required even if ROCm is installed in the default location, this comment is mainly for anyone else who might be attempting to reproduce my steps)

cmake/onnxruntime.cmake

The first issue is:

CMake Error at onnxruntime.cmake:195 (target_link_libraries):
  Target "onnxruntime_providers_rocm" of type MODULE_LIBRARY may not be
  linked into another target.  One may link only to INTERFACE, OBJECT, STATIC
  or SHARED libraries, or to executables with the ENABLE_EXPORTS property
  set.
Call Stack (most recent call first):
  CMakeLists.txt:1923 (include)

From looking at potentially related changes, I removed ${PROVIDERS_ROCM} from set(onnxruntime_INTERNAL_LIBRARIES in cmake/onnxruntime.cmake (line 171).

C++ included in generated_source.c

The file generated by tools/ci_build/gen_def.py includes onnxruntime/core/providers/rocm/rocm_provider_factory.h, which in turn includes include/onnxruntime/core/framework/provider_options.h, which is a C++ header. I see that several headers are already skipped so I did the same for ROCm:

        if c != "winml" and c != "cuda" and c != "migraphx" and c != "rocm":

I also made it skip rocm when reading all the symbols.txt files, but I imagine this is meant to be changed somewhere else, if it is in fact meant to be omitted.

make install

The final issue is with the make install step. In the generated Linux/Release/cmake_install.cmake, the following block attempts to install something that doesn't exist:

if("x${CMAKE_INSTALL_COMPONENT}x" STREQUAL "xUnspecifiedx" OR NOT CMAKE_INSTALL_COMPONENT)
  file(INSTALL DESTINATION "${CMAKE_INSTALL_PREFIX}/include/onnxruntime/core/providers" TYPE DIRECTORY FILES "<absolute path to onnxruntime git repo>/cmake/../include/onnxruntime/core/providers/rocm")
endif()

For my fix, I just removed that section from the generated file, I didn't investigate its source.

Urgency None.

System information

To Reproduce

Expected behavior The project builds

Screenshots N/A

Additional context N/A

snnn commented 2 years ago

ORT has two different kinds of execution providers;

  1. Older ones that exports global functions like OrtSessionOptionsAppendExecutionProvider_CUDA. Such execution providers depend on gen_def.py.
  2. Newer ones with entry points defined in onnxruntime_c_api.h as function pointers. like SessionOptionsAppendExecutionProvider_CUDA. These execution providers do not need gen_def.py, and they can be loaded dynamically.

So, here the questions is: which way rocm wants to go? It would determine if rocm should be skipped in gen_def.py or not.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.