justxi / rocm

Ebuilds to install ROCM on Gentoo Linux
38 stars 23 forks source link

sys-devel/hip-2.5.0 is missing the sys-devel/clang dependency #43

Closed hagar-dunor closed 5 years ago

hagar-dunor commented 5 years ago

Hi,

sys-devel/hip-2.5.0 requires clang to build, at least when I try to build sys-devel/amd-rocm-meta

Sticking clang manually via DEPEND in the hip ebuild solves the problem.

Cheers

justxi commented 5 years ago

Which version of clang did you install?

hagar-dunor commented 5 years ago

The one currently stable, 7.1.0. Won't bet it's the most appropriate, but HIP builds fine.

davidrohr commented 5 years ago

Actually, hcc comes with clang, so I don't see how a separate clang is needed (with 2.5)i. This will change in the future however, since they want to drop hcc and go with clang directly.

justxi commented 5 years ago

I thought that too. @hagar-dunor Can you provide you build log when clang it not installed?

davidrohr commented 5 years ago

Could you try again with the hip 2.6 ebuild? It will anyway install llvm-roc for the code object library, and it should work without system clang.

barolo commented 5 years ago

I'm not sure if it's related but I'm failing hip during 2.6.0 meta with this :

-- The C compiler identification is GNU 9.1.0
-- The CXX compiler identification is GNU 9.1.0
-- Check for working C compiler: /usr/bin/x86_64-pc-linux-gnu-gcc
-- Check for working C compiler: /usr/bin/x86_64-pc-linux-gnu-gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/x86_64-pc-linux-gnu-g++
-- Check for working CXX compiler: /usr/bin/x86_64-pc-linux-gnu-g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
fatal: not a git repository (or any parent up to mount point /var/tmp)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
-- HIP Platform: hcc
-- HIP Compiler: hcc
/usr/lib/hcc/2.6//bin/hcc: error while loading shared libraries: libLLVMAMDGPUCodeGen.so.9svn: cannot open shared object file: No such file or directory
CMake Error at CMakeLists.txt:104 (string):
  string sub-command REGEX, mode REPLACE needs at least 6 arguments total to
  command.

CMake Error at CMakeLists.txt:105 (string):
  string sub-command REGEX, mode REPLACE needs at least 6 arguments total to
  command.

-- Looking for HCC in: /usr/lib/hcc/2.6/. Found version: 
CMake Error at CMakeLists.txt:111 (string):
  string sub-command REPLACE requires at least four arguments.

CMake Error at CMakeLists.txt:115 (list):
  list index: 1 out of range (-1, 0)
davidrohr commented 5 years ago

@hagar-dunor : Actually, I think your problem with clang comes from the cmake option -DBUILD_HIPIFY_CLANG in the ebuild, which requires clang. This is used for the CUDA->HIP conversion tool. And it seems this doesn't require a "matching" version of clang (which would be clang 9), but instead any system clang will do it. Setting this to off in the ebuild should also do the trick. I'll add a use flag for clang in the next ebuild version, and then build HIPIFY_CLANG only when this flag is set, and in that case also require clang as dependency.

davidrohr commented 5 years ago

@barolo : I don't think it is related. In your case, the hcc installation seems broken (which is not related to HIP). It just breaks HIP since HCC is a requirement. I don't know why I don't see this issue. Indeed, I don't have libLLVMAMDGPUCodeGen.so file at all. I'll try to rebuild from scratch and try to reproduce your problem, but this will take some time. Anyway, I guess ROCm 2.7 will be released in few days, perhaps I'll double check this then.

barolo commented 5 years ago

@davidrohr I'm on clang-8.0.1 gonna try rebuilding related stuff. besides all the 2.6.0 stuff from meta package I have cmake-9999 installed for some reason, end rocm-comgr-2.6.0 [ which is masked? ]

davidrohr commented 5 years ago

System Clang shouldn't play any role, comgr is needed indeed. Kind Regards David Rohr

Sent from my mobile. (Excuse the typos!)

On July 26, 2019 12:53:01 PM GMT+02:00, Greg notifications@github.com wrote:

@davidrohr I'm on clang-8.0.1 gonna try rebuilding related stuff. besides all the 2.6.0 stuff from meta package I have cmake-9999 installed for some reason, end rocm-comgr-2.6.0 [ which is masked? ]

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/justxi/rocm/issues/43#issuecomment-515408510

barolo commented 5 years ago

@davidrohr depcleaned, did a rebuild and same error popped up, so I'm giving up for now, I'll report back after 2.7 release Edit: actually rocr-debug-agent fails with the same shared library too

justxi commented 5 years ago

I have the problem with a missing "libLLVMAMDGPUCodeGen.so.9svn" while compiling "rocr-debug-agent", but I think it is not for my Radeon Card (560). So I can not test it, even if it compiles.

justxi commented 5 years ago

I also have the same problem with HIP-2.6:

.. -- HIP Platform: hcc -- HIP Compiler: hcc /usr/lib/hcc/2.6//bin/hcc: error while loading shared libraries: libLLVMAMDGPUCodeGen.so.9svn: cannot open shared object file: No such file or directory

davidrohr commented 5 years ago

Weird, perhaps it has worked for me as I had a clang 9 system installation in parallel. Are you investigating this? I don't have much time this week but would rather give it a new try with 2.7, which should be released soon.

justxi commented 5 years ago

Interesstingly the file "libLLVMAMDGPUCodeGen.so.9svn" is in "/usr/lib/llvm/roc-2.6.0/lib". If I make a progress I will report it here.

And... yes I have seen that AMD is updating libraries for 2.7.

justxi commented 5 years ago

@davidrohr What was you intention when building hip - use HCC or CLANG?

When I modify mycmakeargs according to the following hip compiles: local mycmakeargs=( -DCMAKE_INSTALL_PREFIX="${EPREFIX}/usr/lib/hip/$(ver_cut 1-2)" ${S} -DCMAKE_PREFIX_PATH=${LLVM_BUILD} -DHIP_PLATFORM=clang -DBUILD_HIPIFY_CLANG=ON -DHIP_COMPILER=clang )

I think there is also an env file for llvm-roc missing - /etc/env.d/99llvm-roc: PATH="/usr/lib/llvm/roc-2.6.0/bin" ROOTPATH="/usr/lib/llvm/roc-2.6.0/bin" MANPATH="/usr/lib/llvm/roc-2.6.0/share/man" LDPATH="/usr/lib/llvm/roc-2.6.0/lib"

Can someone test the above and an hip example? (All tested with HIP-2.6.0)

My HCC installation seems to have a problem (?)... I can not build rocBLAS.

davidrohr commented 5 years ago

I tried with the clang platform. In that way, I could build HIP itself, but hip failed to build my application. Therefore I stuck to hcc. I think eventually, we should definitely switch to the clang platform with llvm-roc.

justxi commented 5 years ago

Hmm, when I build the following configuration: local mycmakeargs=( -DCMAKE_INSTALL_PREFIX="${EPREFIX}/usr/lib/hip/$(ver_cut 1-2)" ${S} -DHIP_PLATFORM=hcc -DHIP_COMPILER=hcc -DBUILD_HIPIFY_CLANG=ON -DHCC_HOME=/usr/lib/hcc/$(ver_cut 1-2)/ -DHSA_PATH=/opt/rocm )

HCC does not know the parameter "-hc". @davidrohr Does your "hcc" work?

justxi commented 5 years ago

On my system it seems that hcc it not installing all needed libraries. When I run "ldd /usr/lib/hcc/2.6/bin/hcc" the first missing library is "libLLVMAMDGPUCodeGen.so.9svn".

Edit: After building hcc again, interrupting the ebuild and copying manually a lot of libs... it works. Tested with "hcc -hc saxpy.cpp" (Source).

barolo commented 5 years ago

@davidrohr for some reason this library is not being built --> libLLVMAMDGPUCodeGen.so.9svn I have only: file:///usr/lib/hcc/2.6/lib/libLLVMAMDGPUDesc.so.9svn file:///usr/lib/hcc/2.6/lib/libLLVMAMDGPUUtils.so.9svn

Edit. missed some of the above comments

justxi commented 5 years ago

@barolo On my system the libraries are build but not installed.

After fixing hcc, hip compiles and installs also.

barolo commented 5 years ago

@justxi Yup, just checked, we're having the same issue, how did you fix it?

justxi commented 5 years ago

@barolo As a quick test I inserted a "die" at the end of "src_install" in hip-2.6.0 and then I copied all libraries from the build directory to the destination "/usr/lib/hcc/2.6/lib" (by hand).

But I will try to add this copy command to the ebuild as a quick solution. For the future the build scripts should be changed. But due to the fact that hcc is marked deprecated, we should focus on building hip against the new clang based compiler.

justxi commented 5 years ago

In the CMakeLists.txt file of HCC I found a command that installs a few libraries: https://github.com/RadeonOpenCompute/hcc/blob/6cf476c29792593f52cd66bd0bd96468b8dad7ea/CMakeLists.txt#L359

The question is, why only these libraries should be installed(?)

Edit: Answer: Looking at the Debian package, the hcc command links to clang-9, which includes all libraries (llvm,clang,lld) except some dynamic linked system libraries. So there are two options: installing all the dynamic libraries (*.so) or find the right option to include the libs into the executable. LLVM has some option regarding dynamic/static libs and linking.

I think this is reason why it is working if a sys-devel/clang-9 is installed.

hagar-dunor commented 5 years ago

Hi all,

I tried amd-rocm-meta as I write this, and there are at least 3 issues as far as I see

1) Verifying ebuild manifests !!! A file is not listed in the Manifest: '/usr/local/overlays/rocm/dev-libs/rocm-opencl-runtime/files/add-rpath.patch'

2) Failed to emerge sys-devel/llvm-roc-2.6.0, Log file: '/var/tmp/portage/sys-devel/llvm-roc-2.6.0/temp/build.log'

see attached build.log llvm-roc-build.log.gz

3) Failed to emerge dev-libs/rocr-debug-agent-2.6.0, Log file: '/var/tmp/portage/dev-libs/rocr-debug-agent-2.6.0/temp/build.log'

see attached build.log rocr-debug-agent-build.log

I've attached as well my emerge --info, my world file (clean KDE Plasma install), and the std output of emerge amd-rocm-meta

emerge_amd-rocm-meta.txt emerge_info.txt world.txt

davidrohr commented 5 years ago
  1. Shouldn't be a real problem, since the rpath patch is not used. It is needed for the experimental ebuild to build the opencl-runtime, v.s. llvm-roc, which is not working right now and thus disabled.

  2. In your log I see:

    FAILED: tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExpr.cpp.o 
    /usr/bin/x86_64-pc-linux-gnu-g++ -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Sema -I/var/tmp/portage/sys-devel/llvm-roc-2.6.0/work/llvm-roc-ocl-2.6.0/tools/clang/lib/Sema -I/var/tmp/portage/sys-devel/llvm-roc-2.6.0/work/llvm-roc-ocl-2.6.0/tools/clang/include -Itools/clang/include -I/usr/include/libxml2 -Iinclude -I/var/tmp/portage/sys-devel/llvm-roc-2.6.0/work/llvm-roc-ocl-2.6.0/include  -march=sandybridge -O2 -pipe -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing    -fno-exceptions -fno-rtti -MD -MT tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExpr.cpp.o -MF tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExpr.cpp.o.d -o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExpr.cpp.o -c /var/tmp/portage/sys-devel/llvm-roc-2.6.0/work/llvm-roc-ocl-2.6.0/tools/clang/lib/Sema/SemaExpr.cpp
    x86_64-pc-linux-gnu-g++: fatal error: Killed signal terminated program cc1plus
    compilation terminated.

    Perhaps you ran out of memory during compilation or so? Something killed your gcc, that is why it is not building.

For 3.: I have actually no idea. I cannot reproduce it, It works for me, and the error message doesn't tell me anything.

hagar-dunor commented 5 years ago

Good call for llvm-roc, I increased the vram for the vm I'm building on to 64G and it succeeded. Probably had too much parallel gcc jobs for 16G of vram.

As for hip and rocr-debug-agent, it's the same issue reported above by justxi about libLLVMAMDGPUCodeGen.so.9svn not being found.

justxi commented 5 years ago

As mentioned above "hcc" is build with dynamic libraries and the ebuild does not install all dyn. libs. I am currently trying the "BUILD_SHARED_LIBS=OFF" option, which seem to produce a statically linked clang-9 (hcc links to it).

@davidrohr Can you check your "hcc" where "ldd hcc" resolves the libraries?

justxi commented 5 years ago

I have uploaded "hcc-2.6.0-r1" which builds in the libraries into the executables. Currently I am trying to build "hip" against that.

justxi commented 5 years ago

Finally, I merged "amd-rocm-meta" (-debug-tools) successfully on my system and "rocr-debug-agent" merges also.

Please try =).

Hopefully a few days rest before ROCm 2.7 is rolled out ;).

justxi commented 5 years ago

In addition: I compiled the "vectoradd" example of hip, I had to merge "yaml-cpp" and set HIP_IGNORE_HCC_VERSION="1". After that vectoradd passed successfully.

hagar-dunor commented 5 years ago

success too, but had to

ebuild rocm-opencl-runtime-2.6.0.ebuild digest ebuild amd-rocm-meta-2.5.0.ebuild digest

else both complain; opencl-runtime as I mentioned above, and rocm-meta does

!!! Digest verification failed: !!! /usr/local/overlays/rocm/sys-devel/amd-rocm-meta/amd-rocm-meta-2.5.0.ebuild !!! Reason: Filesize does not match recorded size !!! Got: 341 !!! Expected: 305

justxi commented 5 years ago

I updated both manifests. Hopfully it works now.

hagar-dunor commented 5 years ago

It does (mind the warning at the end to replace http by https in a link):

rocm ~ # emerge amd-rocm-meta
Calculating dependencies... done!

Verifying ebuild manifests Emerging (1 of 25) media-libs/hsa-amd-aqlprofile-1.0.0::rocm Emerging (2 of 25) sys-process/numactl-2.0.11::gentoo Emerging (3 of 25) app-admin/chrpath-0.13-r2::gentoo Emerging (4 of 25) app-eselect/eselect-blas-0.1::gentoo Emerging (5 of 25) app-eselect/eselect-cblas-0.1::gentoo Emerging (6 of 25) dev-lang/ocaml-4.04.2-r1::gentoo Emerging (7 of 25) dev-cpp/gtest-1.8.1::gentoo Emerging (8 of 25) dev-libs/rocm-cmake-9999::rocm Installing (1 of 25) media-libs/hsa-amd-aqlprofile-1.0.0::rocm Installing (5 of 25) app-eselect/eselect-cblas-0.1::gentoo Installing (4 of 25) app-eselect/eselect-blas-0.1::gentoo Installing (8 of 25) dev-libs/rocm-cmake-9999::rocm Emerging (9 of 25) sci-libs/blas-reference-20070226-r4::gentoo Installing (3 of 25) app-admin/chrpath-0.13-r2::gentoo Installing (2 of 25) sys-process/numactl-2.0.11::gentoo Installing (7 of 25) dev-cpp/gtest-1.8.1::gentoo Emerging (10 of 25) dev-libs/roct-thunk-interface-2.6.0::rocm Installing (10 of 25) dev-libs/roct-thunk-interface-2.6.0::rocm Installing (9 of 25) sci-libs/blas-reference-20070226-r4::gentoo Emerging (11 of 25) dev-libs/rocr-runtime-2.6.0::rocm Emerging (12 of 25) virtual/blas-1.0::gentoo Installing (12 of 25) virtual/blas-1.0::gentoo Emerging (13 of 25) sci-libs/cblas-reference-20030223-r6::gentoo Installing (11 of 25) dev-libs/rocr-runtime-2.6.0::rocm Emerging (14 of 25) dev-util/rocm-smi-2.6.0::rocm Emerging (15 of 25) dev-util/rocprofiler-2.6.0::rocm Installing (14 of 25) dev-util/rocm-smi-2.6.0::rocm Emerging (16 of 25) dev-util/rocminfo-2.6.0::rocm Installing (13 of 25) sci-libs/cblas-reference-20030223-r6::gentoo Installing (16 of 25) dev-util/rocminfo-2.6.0::rocm Emerging (17 of 25) virtual/cblas-1.0::gentoo Emerging (18 of 25) sys-devel/hcc-2.6.0-r1::rocm Installing (17 of 25) virtual/cblas-1.0::gentoo Installing (15 of 25) dev-util/rocprofiler-2.6.0::rocm Emerging (19 of 25) sys-devel/llvm-roc-2.6.0::rocm Installing (6 of 25) dev-lang/ocaml-4.04.2-r1::gentoo Emerging (20 of 25) dev-ml/findlib-1.7.1::gentoo Installing (20 of 25) dev-ml/findlib-1.7.1::gentoo Emerging (21 of 25) dev-libs/rocm-opencl-runtime-2.6.0::rocm Installing (21 of 25) dev-libs/rocm-opencl-runtime-2.6.0::rocm Installing (19 of 25) sys-devel/llvm-roc-2.6.0::rocm Emerging (22 of 25) dev-libs/rocm-device-libs-2.6.0::rocm Installing (18 of 25) sys-devel/hcc-2.6.0-r1::rocm Installing (22 of 25) dev-libs/rocm-device-libs-2.6.0::rocm Emerging (23 of 25) dev-libs/rocm-comgr-2.6.0::rocm Installing (23 of 25) dev-libs/rocm-comgr-2.6.0::rocm Emerging (24 of 25) sys-devel/hip-2.6.0::rocm Installing (24 of 25) sys-devel/hip-2.6.0::rocm Emerging (25 of 25) sys-devel/amd-rocm-meta-2.6.0-r1::rocm Installing (25 of 25) sys-devel/amd-rocm-meta-2.6.0-r1::rocm Recording sys-devel/amd-rocm-meta in "world" favorites file... Jobs: 25 of 25 complete Load avg: 5.0, 6.7, 11.5

(...)

(...)

There is no reason to keep this issue open, closing it.

justxi commented 5 years ago

Looks good. The hint to use "https" comes from the "gitmodules" file in "hcc". I don´t know if this is intended.

I dropped a note at: https://github.com/RadeonOpenCompute/hcc/commit/44655c4b6230ddc272ffaa6c2637938323efa46e#diff-8903239df476d7401cf9e76af0252622