oneapi-src / Velocity-Bench

Other
44 stars 15 forks source link

dl-cifar oneDNN requirements #34

Closed Naghasan closed 1 year ago

Naghasan commented 1 year ago

Hi,

dl-cifar doesn't list the required version of oneDNN. I tried 3.1/3.2/head but dl-cifar won't build.

thanks

rmukhopa commented 1 year ago

Hi @Naghasan, could you please provide the command you used the and build log, so that we can take a look? Thanks.

Naghasan commented 1 year ago

Hi @rmukhopa

So my current build of dl-cifar fails with

velocity_bench/dl-cifar/SYCL/blas_routines.h:54:26: error: no member named 'gemm' in namespace 'oneapi::mkl::blas'
   54 | using oneapi::mkl::blas::gemm;
      |       ~~~~~~~~~~~~~~~~~~~^
velocity_bench/dl-cifar/SYCL/blas_routines.h:469:17: error: use of undeclared identifier 'gemm'
  469 |                 gemm(*(langHandle->getSyclQueue()), transpose::nontrans, transpose::nontrans, n, m, k, alpha, d_B, n, d_A,
      |                 ^
velocity_bench/dl-cifar/SYCL/blas_routines.h:494:17: error: use of undeclared identifier 'gemm'
  494 |                 gemm(*(langHandle->getSyclQueue()), transpose::nontrans, transpose::trans, n, m, k, alpha, d_B, n, d_A,
      |                 ^
velocity_bench/dl-cifar/SYCL/blas_routines.h:519:17: error: use of undeclared identifier 'gemm'
  519 |                 gemm(*(langHandle->getSyclQueue()), transpose::trans, transpose::nontrans, n, m, k, alpha, d_B, k, d_A,
      |                 ^
4 error(s) generated

And it is configure cmake -DDNNLROOT=<path to oneDNN build>. The error is the same with the 3.x versions and head. I can't use 2.7 as it doesn't build with tip dpcpp.

Naghasan commented 1 year ago

I forgot to mention, I have a local change to dl-cifar's CMake so it can use a local oneDNN install (I intended to make a PR after making this work)

-if(NOT DEFINED DNNLROOT AND DEFINED ENV{DNNLROOT})
+if(NOT DEFINED DNNLROOT)
+  if(DEFINED ENV{DNNLROOT})
     set(DNNLROOT "$ENV{DNNLROOT}" CACHE STRING "")
-else()
+  if(DEFINED ENV{DNNLROOT})
     set(DNNLROOT "$ENV{DNNLROOT}" CACHE STRING "")
-else()
+  else()
     set(DNNLROOT "${PROJECT_SOURCE_DIR}/.." CACHE STRING "")
+  endif()
 endif()

 if(USE_AMD_BACKEND)
@@ -132,7 +135,7 @@ endif()
 include_directories(${CMAKE_SOURCE_DIR}
                     ${CMAKE_SOURCE_DIR}/../common
                     ${CMAKE_SOURCE_DIR}/../../infrastructure
-                    #${DNNLROOT}/include
+                    ${DNNLROOT}/include
                     ${CUDA_INCLUDE_DIRS}
                     ${CUDA_TOOLKIT_INCLUDE}

@@ -143,7 +146,7 @@ include_directories(${CMAKE_SOURCE_DIR}
                     # ${HIP_PATH}/../miopen/include/miopen             
 )

-#link_directories(${DNNLROOT}/lib)
+link_directories(${DNNLROOT}/lib)
rmukhopa commented 1 year ago

Thanks for the logs. I see that the build error is for oneMKL rather than oneDNN. This workload uses both oneDNN and oneMKL libraries. Could you please add oneMKL as a dependency and try to build?

rmukhopa commented 1 year ago

Hi @Naghasan were you able to build fine after adding oneMKL library dependency in your environment?

Naghasan commented 1 year ago

Hi, so I've been trying to, I have an install of onemkl on the same path as onednn, so should find it, however I still have the error.

What is the required version ?

rmukhopa commented 1 year ago

Hi @Naghasan, I don't think this is an issue about using a wrong version. From the log, I see that there is failure in recognizing the gemm function, which is a basic function in oneMKL that is available across many versions. So looks the MKL library is not being found. Could you please add the paths to the standard paths and try again? For example, add the lib path to LD_LIBRARY_PATH and include path to C_INCLUDE_PATH and CPLUS_INCLUDE_PATH.

Naghasan commented 1 year ago

Well, I added oneMKL to cmake (same as https://github.com/oneapi-src/Velocity-Bench/pull/30). But on my install, it is in the same locations as the onednn lib (which is found) anyway and hacking the environment doesn't change the outcome. So the issue seems to be with the lib.

rmukhopa commented 1 year ago

I see, thanks. I looked at #30. So were you able to build fine before, i.e., without any change to the cmake file? I see two paths forward:

  1. If the build was failing before too, I would suggest to put aside your cmake changes for a moment, and let's please try to make the build work run successfully with the original workload code.
  2. Build was succeeding before, but after the cmake changes it is not. For this, we need to analyze a bit more closely your changes, and see what can be fixed.
Naghasan commented 1 year ago

So were you able to build fine before

No, I've never been able to.

rmukhopa commented 1 year ago

Ok, thanks. Let's first try to get the build working on the workload without any additional cmake changes.

Naghasan commented 1 year ago

So changing using oneapi::mkl::blas::gemm; to using oneapi::mkl::blas::column_major::gemm; allows to the program to compile with the oneMKL interface. We are still experiencing issues though. We will make a PR once they are resolved.

Thanks for the help.