sowson / darknet

Darknet on OpenCL Convolutional Neural Networks on OpenCL on Intel & NVidia & AMD & Mali GPUs for macOS & GNU/Linux & Windows & FreeBSD
http://pjreddie.com/darknet/
Other
184 stars 31 forks source link

32 bit Single Precision FP #80

Open msaideroglu opened 2 weeks ago

msaideroglu commented 2 weeks ago

I want to compile the model with a 32 bit Single Precision FP GPU. How should I deactivate double precision? I could'nt find that config param in src/opencl.c or src/opencl.h.

sowson commented 1 week ago

Have you tried that compilation and usage? Darknet not using anywhere the double precision memory numbers only float so it should work by default. Thanks! Please provide some logs or errors you face and that may drive the discussion. On this point I am not sure what OS? What GPU? etc... it is hard to help you. Thanks again! Piotr :owl:

msaideroglu commented 1 week ago

Hi Piotr. Thank you for your return. I want to compile and run your Darknet model in Vortex GPGPU Design. It is an open-source RISC-V based GPGPU Design from Georgia Tech.

Actually, I'm seeing the error in CLBLAS xgemm.cc. I'm getting same error during running sample sgemm.c code in CLBLAS/samples.

The error is:

makeGemmKernel: Creating program from source
2 warnings generated.
clBuildProgram Failed
err = -11

Build Log:

warning: ~/.cache/pocl/uncached/tempfile_gQ6Fyx.cl:101:54: double precision constant requires cl_khr_fp64, casting to single precision
warning: ~/.cache/pocl/uncached/tempfile_gQ6Fyx.cl:104:53: double precision constant requires cl_khr_fp64, casting to single precision
Error(s) while linking: 
Cannot find symbol _Z13_cl_mem_fencej in kernel library
Device Vortex OpenGPU failed to build the program

OpenCL error -11 on line 256 of ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc
sgemm: ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc:256: void makeGemmKernel(_cl_kernel**, cl_command_queue, const char*, const char*, const unsigned char**, size_t*, const char*): Assertion `false' failed.
Aborted (core dumped)

First, I thought that it's because of not supporting double precision by Vortex. So, I run the code in gdb mode. Backtrace:

clBuildProgram Failed
err = -11

Build Log:

warning: ~/.cache/pocl/uncached/tempfile_wZ7NMy.cl:101:54: double precision constant requires cl_khr_fp64, casting to single precision
warning: ~/.cache/pocl/uncached/tempfile_wZ7NMy.cl:104:53: double precision constant requires cl_khr_fp64, casting to single precision
Error(s) while linking: 
Cannot find symbol _Z13_cl_mem_fencej in kernel library
Device Vortex OpenGPU failed to build the program

OpenCL error -11 on line 256 of ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc
sgemm: ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc:256: void makeGemmKernel(_cl_kernel**, cl_command_queue, const char*, const char*, const unsigned char**, size_t*, const char*): Assertion `false' failed.

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737219640576) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737219640576) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737219640576) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737219640576, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff6442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff64287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff642871b in __assert_fail_base (fmt=0x7ffff65dd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff6d78f57 "false", 
    file=0x7ffff6d78eb8 "~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc", line=256, function=<optimized out>)
    at ./assert/assert.c:92
#6  0x00007ffff6439e96 in __GI___assert_fail (assertion=0x7ffff6d78f57 "false", 
    file=0x7ffff6d78eb8 "~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc", line=256, 
    function=0x7ffff6d78f60 "void makeGemmKernel(_cl_kernel**, cl_command_queue, const char*, const char*, const unsigned char**, size_t*, const char*)")
    at ./assert/assert.c:101
#7  0x00007ffff6ac1cc6 in makeGemmKernel (clKernel=0x7fffffffd0b8, clQueue=0x555555968c10, 
    kernelSource=0x7ffff6f06868 "\n/* sgemm_Col_NN_B1_ML016_NL016_KX01 */\n\n/* kernel parameters */\n#define WG_NUM_ROWS          16\n#define WG_NUM_COLS          16\n#define MICRO_TILE_NUM_ROWS  1\n#define MICRO_TILE_NUM_COLS  1\n#define M"..., sourceBuildOptions=0x7ffff6e02ee6 "-cl-std=CL1.2", 
    kernelBinary=0x7fffffffd050, kernelBinarySize=0x7ffff7f7e7f0 <sgemm_Col_NN_B1_ML016_NL016_KX01_binSize>, 
    binaryBuildOptions=0x7ffff6e02ed8 "-cl-std=CL1.2") at ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc:256
#8  0x00007ffff6ac3694 in clblasGemm<float> (order=clblasColumnMajor, transA=clblasNoTrans, transB=clblasNoTrans, iM=3, iN=2, iK=4, alpha=10, 
    iA=0x555555976300, iOffA=6, iLda=5, iB=0x5555559764f0, iOffB=4, iLdb=3, beta=20, C=0x5555559766e0, iOffC=4, iLdc=3, numCommandQueues=1, 
    commandQueues=0x7fffffffd280, numEventsInWaitList=0, eventWaitList=0x0, events=0x7fffffffd288)
    at ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc:613
#9  0x00007ffff6ac21d4 in clblasSgemm (order=clblasRowMajor, transA=clblasNoTrans, transB=clblasNoTrans, M=3, N=2, K=4, alpha=10, A=0x555555976300, 
    offA=6, lda=5, B=0x5555559764f0, offB=4, ldb=3, beta=20, C=0x5555559766e0, offC=4, ldc=3, numCommandQueues=1, commandQueues=0x7fffffffd280, 
    numEventsInWaitList=0, eventWaitList=0x0, events=0x7fffffffd288)
    at ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson_buildlog/src/library/blas/xgemm.cc:744
#10 0x00005555555557a3 in main ()

It seems that CLBLAS casts the kernel to the single precision and uses clblasSgemm instead of clblasDgemm, however makeGemmKernel still gives an error. I also built CLBLAS by adding OpenCL location in CMakeLists.txt as:

set(OPENCL_INCLUDE_DIRS ~/Desktop/GPGPU/Vortex_GPGPU/tools_vortex_v2.2/pocl/include)
set(OPENCL_LIBRARIES ~/Desktop/GPGPU/Vortex_GPGPU/tools_vortex_v2.2/pocl/lib/libOpenCL.so)

It runs the test codes during build and passes however when I run the sample sgemm.cpp code seperately, the result is same.

sowson commented 1 week ago

Could you (just for test) activate (switch OFF to ON) on the ARM switch in CMakeLists.txt? It will not use clBLAS and not be super optimal but we then know that the issue is in the clBLAS. Thanks! In CMakeLists.txt from: option(DARKNET_ARM "Enable ARM support" OFF) into: option(DARKNET_ARM "Enable ARM support" ON) Thanks!

msaideroglu commented 1 week ago

Hi Piotr. In ARM mode enabled, it doesnt give any other serious error except some solvable warnings. However the performance is terrible in comparison with CLBLAS. So, I need to successfully compile CLBLAS library with my OpenCL.
I found a way to build CLBLAS with single precision option. It is: in CMakeLists.txt

set(OPENCL_INCLUDE_DIRS ~/Desktop/GPGPU/Vortex_GPGPU/tools_vortex_v2.2/pocl/include)
set(OPENCL_LIBRARIES ~/Desktop/GPGPU/Vortex_GPGPU/tools_vortex_v2.2/pocl/lib/libOpenCL.so)

and give manual build option -cl-single-precision-constant to clBuildProgram in /library/blas/xgemm.cc:

clProgram = clCreateProgramWithSource(
        clContext,
        1, &kernelSource,
        NULL, &err );
      CL_CHECK(err)
      err = clBuildProgram(
        clProgram,
        1, &clDevice,
        "-cl-single-precision-constant", NULL, NULL );
      if (err != CL_SUCCESS) {
        printf("clBuildProgram Failed\n");
        printf("err = %d\n", err);

        size_t len = 0;
        clGetProgramBuildInfo(clProgram, clDevice, CL_PROGRAM_BUILD_LOG, 0, NULL, &len);
        char* buildLog = new char[len];
        clGetProgramBuildInfo(clProgram, clDevice, CL_PROGRAM_BUILD_LOG, len*sizeof(char), buildLog, 0);
        printf("\nBuild Log:\n\n");
        printf("%s\n", buildLog);
      }
      CL_CHECK(err)
    }

now sample sgemm.c test program gives same error without warnings about double precision:

Starting program: ~/Vortex_OpenCL/sgemm/sgemm 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
sgemm_Col_NN_B1_3x2x4
makeGemmKernel: Creating program from source
clBuildProgram Failed
err = -11

Build Log:

Error(s) while linking: 
Cannot find symbol _Z13_cl_mem_fencej in kernel library
Device Vortex OpenGPU failed to build the program

OpenCL error -11 on line 256 of ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson/src/library/blas/xgemm.cc
sgemm: ~/Desktop/CLBLAS/Vortex/clBLAS-2.12-sowson/src/library/blas/xgemm.cc:256: void makeGemmKernel(_cl_kernel**, cl_command_queue, const char*, const char*, const unsigned char**, size_t*, const char*): Assertion `false' failed.

So actual error is Cannot find symbol _Z13_cl_mem_fencej in kernel library. However there is sample sgemm example in Vortex repo. It includes a command

barrier(CLK_LOCAL_MEM_FENCE);

successfully compiles and runs it. I think Vortex has no problem with MEM_FENCE(Idk whatever it is). Where this _cl_mem_fence comes from in CLBLAS xgemm.cc? Do you have any idea about how to resolve it?

msaideroglu commented 6 days ago

I found the source of error. My OpenCL implements barrier(CLK_LOCAL_MEM_FENCE) but not mem_fence(CLK_LOCAL_MEM_FENCE). mem_fence(CLK_LOCAL_MEM_FENCE) gives Cannot find symbol _Z13_cl_mem_fencej in kernel library. The error is coming from mem_fence(CLK_LOCAL_MEM_FENCE) in CLBLAS kernels. It seems deprecated from OpenCL2.0 according to Wiki page of OpenCL. However my OpenCL version is 1.2. Any idea how to resolve it?