kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
303 stars 96 forks source link

Will the following operations cause errors? #1113

Open yuanAIhan opened 3 years ago

yuanAIhan commented 3 years ago

When I want to change the first value. When choosing to OpenMP. And I set the number of threads to be greater than 1 in intel CPU。 like this:omp_set_num_threads(16); The program will take a long time to run. It makes people feel like the program is stuck! The path of the code is: ./example/wiki/sparse/KokkosSparse_wiki_spgemm.cpp !

Can you tell me something about it!

srajama1 commented 3 years ago

Can you give the Cmake options you used to compile, what matrix you are using etc.? There is not much information here to help you.

yuanAIhan commented 3 years ago

Can you tell me what is the function of this ? kh.set_team_work_size(32); cmake .. -DKokkos_DIR=/home/kokkos/kokkos-install/lib/cmake/Kokkos -DCMAKE_INSTALL_PREFIX=/home/kokkos/kokkos-kernels-install -DKokkosKernels_ENABLE_EXAMPLES=ON。The path in this is the installation path of kokkos! I want to change the size of the matrix , and test run time. But I found that when the dimension of the matrix is very large, the running time will be very long. Unless I change "kh.set_team_work_size(32)", can you tell what it does?

srajama1 commented 3 years ago

@yuanAIhan You did not answer my question above and you are asking a different question :) ?

Can you tell us what are you trying to do? How did you compile, what backends you use, what is the matrix you are using etc.? What do you mean by changing the size of the matrix? Are you trying to do a scaling study? This is going to depend on the structure.

What is the other matrix you are multiplying with?

What is very large and very long for you?

To the one specific qn you asked, set_team_work_size is described here

https://github.com/kokkos/kokkos-kernels/blob/564dccb339d8d1528c2bb948abdac0c6e48e09d5/src/common/KokkosKernels_Handle.hpp#L337

yuanAIhan commented 3 years ago

First of all, I apologize for not clarifying my question. My problem is that I am running the example of kokkos-kernels on an arm aarch64 machine. My compilation instructions are very simple, just run the example. My purpose is to change the matrix dimension in the example of /kokkos-kernels/example/wiki/sparse/KokkosSparse_wiki_spgemm.cpp, and then see how it performs on the arm machine. During the test, I discovered that I don’t understand how this piece of code affects the running time! Maybe I want to know the implementation details of your algorithm about spgemm. https://github.com/kokkos/kokkos-kernels/blob/master/example/wiki/sparse/KokkosSparse_wiki_spgemm.cpp This is the example I run。 KernelHandle kh; kh.set_team_work_size(16); kh.set_dynamic_scheduling(true); image compile : firstly I install kokkos:

cmake \

-DCMAKE_CXX_COMPILER=g++ \

-DCMAKE_INSTALL_PREFIX=../../kokkos-install \

-DKokkos_ENABLE_CUDA=OFF \

-DKokkos_ENABLE_OPENMP=ON \

-DKokkos_ENABLE_SERIAL=ON \

-DKokkos_ENABLE_EXAMPLES=ON \

-DCMAKE_VERBOSE_MAKEFILE=ON \

-DCMAKE_CXX_EXTENSIONS=OFF \

-DCMAKE_BUILD_TYPE=Release \

..

sconedly I install kokkos-kernels: cmake .. \ -DKokkos_DIR=/home/zhanggy/kokkos/kokkos-install/lib/cmake/Kokkos \ -DCMAKE_INSTALL_PREFIX=/home/zhanggy/kokkos/kokkos-kernels-install \ -DKokkosKernels_ENABLE_EXAMPLES=ON