intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.22k stars 732 forks source link

how to solve "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" in hip platform #8158

Open wangzy0327 opened 1 year ago

wangzy0327 commented 1 year ago

Describe the bug "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" Abort(core dumped)

how to correctly execute code in MI100 with compile option gfx906 instead of gfx908?

To Reproduce

compile 2022-09 version sycl-hip refer to sycl-compile

Environment (please complete the following information):

There is a simple test code.

vector_add.cpp
/***************************************************************************
 *
 *  Copyright (C) 2016 Codeplay Software Limited
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 *  For your convenience, a copy of the License has been included in this
 *  repository.
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 *
 *  Codeplay's ComputeCpp SDK
 *
 *  simple-vector-add.cpp
 *
 *  Description:
 *    Example of a vector addition in SYCL.
 *
 **************************************************************************/

/* This example is a very small one designed to show how compact SYCL code
 * can be. That said, it includes no error checking and is rather terse. */
#include <CL/sycl.hpp>

#include <array>
#include <iostream>

constexpr sycl::access::mode sycl_read = sycl::access::mode::read;
constexpr sycl::access::mode sycl_write = sycl::access::mode::write;

/* This is the class used to name the kernel for the runtime.
 * This must be done when the kernel is expressed as a lambda. */
template <typename T>
class SimpleVadd;

template <typename T, size_t N>
void simple_vadd(const std::array<T, N>& VA, const std::array<T, N>& VB,
                 std::array<T, N>& VC) {
  sycl::queue deviceQueue;
  sycl::range<1> numOfItems{N};
  sycl::buffer<T, 1> bufferA(VA.data(), numOfItems);
  sycl::buffer<T, 1> bufferB(VB.data(), numOfItems);
  sycl::buffer<T, 1> bufferC(VC.data(), numOfItems);

  deviceQueue.submit([&](sycl::handler& cgh) {
    auto accessorA = bufferA.template get_access<sycl_read>(cgh);
    auto accessorB = bufferB.template get_access<sycl_read>(cgh);
    auto accessorC = bufferC.template get_access<sycl_write>(cgh);

    auto kern = [=](sycl::id<1> wiID) {
      accessorC[wiID] = accessorA[wiID] + accessorB[wiID];
    };
    cgh.parallel_for<class SimpleVadd<T>>(numOfItems, kern);
  });
}

int main() {
  const size_t array_size = 4;
  std::array<sycl::cl_int, array_size> A = {{1, 2, 3, 4}},
                                           B = {{1, 2, 3, 4}}, C;
  std::array<sycl::cl_float, array_size> D = {{1.f, 2.f, 3.f, 4.f}},
                                             E = {{1.f, 2.f, 3.f, 4.f}}, F;
  simple_vadd(A, B, C);
  simple_vadd(D, E, F);
  for (unsigned int i = 0; i < array_size; i++) {
    if (C[i] != A[i] + B[i]) {
      std::cout << "The results are incorrect (element " << i << " is " << C[i]
                << "!\n";
      return 1;
    }
    if (F[i] != D[i] + E[i]) {
      std::cout << "The results are incorrect (element " << i << " is " << F[i]
                << "!\n";
      return 1;
    }
  }
  std::cout << "The results are correct!\n";
  return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 2.8.12)

set(DPCPP_HOME "/home/wzy/sycl_workspace")
set(DPCPP_SYCL_HOME "${DPCPP_HOME}/build-hip-2022-09-log-debug")

set(CMAKE_C_COMPILER "${DPCPP_SYCL_HOME}/bin/clang")
set(CMAKE_CXX_COMPILER "${DPCPP_SYCL_HOME}/bin/clang++")
set(CMAKE_CXX_STANDARD 17)

project(vector_add)

include_directories("${DPCPP_SYCL_HOME}/include/sycl")
include_directories("${DPCPP_SYCL_HOME}/include")

message(STATUS "dpcpp_home : ${DPCPP_HOME}")
message(STATUS "dpcpp_cuda_sycl_home : ${DPCPP_SYCL_HOME}")

message(STATUS "find library path : ${DPCPP_SYCL_HOME}/lib")
set(CMAKE_BUILD_RPATH "${DPCPP_SYCL_HOME}/lib;${CMAKE_BUILD_RPATH}")
message(STATUS "cmake build rpath : ${CMAKE_BUILD_RPATH}")

set(CMAKE_BUILD_TYPE "Debug")
set(CMAKE_CXX_FLAGS "-fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx906")
set(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -ggdb -std=c++17")
set(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -std=c++17")

link_directories("${DPCPP_SYCL_HOME}/lib")

aux_source_directory(. DIR_SRCS)
add_executable(vector_add ${DIR_SRCS})
target_include_directories(vector_add PRIVATE "${DPCPP_SYCL_HOME}/include/sycl")
target_include_directories(vector_add PRIVATE "${DPCPP_SYCL_HOME}/include")
target_link_libraries(vector_add PRIVATE sycl )

Additional context Add any other context about the problem here.

npmiller commented 1 year ago

how to correctly execute code in MI100 with compile option gfx906 instead of gfx908?

This is not possible, you have to specify the correct architecture for your hardware, this also fails with hipcc, for example with a gfx908 GPU I get:

$ hipcc -std=c++14 -Wall -O3 --offload-arch=gfx906 main.cu -o main
$ ./main
$ "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"