takagi / cl-cuda

Cl-cuda is a library to use NVIDIA CUDA in Common Lisp programs.
MIT License
279 stars 24 forks source link

Any way to run on Windows? #70

Open jaccarmac opened 8 years ago

jaccarmac commented 8 years ago

Setting up cl-cuda seems to hook into gcc to create the FFI. GCC is well and good thanks to MSYS2/MinGW64, but apparently the CUDA toolkit and MinGW don't play nice together. Is there any way to set up cl-cuda to use the Windows CUDA toolchain?

takagi commented 8 years ago

I did not try cl-cuda on Windows, but I suppose that if you could satisfy the following points, cl-cuda would run on Windows even natively without MSYS/MinGW help. How about these?

jaccarmac commented 8 years ago

nvcc works, haven't tried it with actual input files but it is on the PATH. Will try to compile samples and see what happens.

nvcc can be run through SBCL (sb-ext:run-program "nvcc" nil :search t).

Can't find libcuda.dll on my system, even though I have a CUDA card and have installed the developer SDK. Is that a secondary dependency? I'll do some more research momentarily.

jaccarmac commented 8 years ago

I can find cuda.lib but no cuda.dll.

takagi commented 8 years ago

Cupy https://github.com/pfnet/chainer/tree/master/cupy does the almost same thing with cl-cuda in Python, generating CUDA C codes, compiling them with NVCC and launching kernels, and it works on Windows as well, so it should be possible.

jaccarmac commented 8 years ago

My installation was slightly borked due to the lack of a valid Visual Studio version. That problem is fixed and my environment is actually working now, but I still can't find the right DLL(s). Haven't taken a look at exactly what cupy does yet. Here are the DLLs I can find.

cublas64_75.dll
cudart32_75.dll
cudart64_75.dll
cufft64_75.dll
cufftw64_75.dll
cuinj32_75.dll
cuinj64_75.dll
curand64_75.dll
cusolver64_75.dll
cusparse64_75.dll
nppc64_75.dll
nppi64_75.dll
npps64_75.dll
nvblas64_75.dll
nvrtc64_75.dll
nvrtc-builtins64_75.dll
takagi commented 8 years ago

This https://developer.nvidia.com/cuda-faq says that needed to use the driver API is "nvcuda.dll" and it is included as part of the standard NVIDIA driver install. Would you find it in Windows system folders such as System32? Cl-cuda uses the driver API only.

jaccarmac commented 8 years ago

Appears to work on SBCL for me.

* (ql:quickload :cffi)
To load "cffi":
  Load 1 ASDF system:
    cffi
; Loading "cffi"
........
(:CFFI)
* (cffi:load-foreign-library "nvcuda")

#<CFFI:FOREIGN-LIBRARY NVCUDA-523 "nvcuda">
takagi commented 8 years ago

Okay, then you should be able to load cl-cuda with nvcuda.dll.

(ql:quickload :cl-cuda)

Please set *nvcc-binary* to the path to NVCC compiler and try to run some sample programs.

(setf cl-cuda:*nvcc-binary* #P"path\to\nvcc")

(ql:quickload :cl-cuda-examples)
(cl-cuda-examples.vector-add:main)

You may need to pass some options to nvcc via *nvcc-options*, please let me know what you will get.

jaccarmac commented 8 years ago

Can't even load the system in the first place thanks to an error groveling a file in cl-cuda. The full stacktrace from SLIME, since I'm very unfamiliar with native integration in SBCL.

Couldn't execute "gcc": The system cannot find the file specified.
   [Condition of type CFFI-GROVEL:GROVEL-ERROR]

Restarts:
 0: [RETRY] Retry PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">.
 1: [ACCEPT] Continue, treating PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel"> as having been successful.
 2: [RETRY] Retry ASDF operation.
 3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
 4: [ABORT] Give up on "cl-cuda"
 5: [RETRY] Retry SLIME REPL evaluation request.
 --more--

Backtrace:
  0: (CFFI-GROVEL:GROVEL-ERROR "~a" #<SIMPLE-ERROR "Couldn't execute ~S: ~A" {1006781B33}>)
  1: ((FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE))
  2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE) {9F2DDBB}>)
  3: (CFFI-GROVEL:PROCESS-GROVEL-FILE #P"C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/src/driver-api/type-grovel.lisp" #P"C:/Users/jaccarmac/AppData/Local/cache/common-lisp/sbcl-1.3.6-win-x..
  4: ((:METHOD ASDF/ACTION:PERFORM (CFFI-GROVEL::PROCESS-OP CFFI-GROVEL:GROVEL-FILE)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  5: ((SB-PCL::EMF ASDF/ACTION:PERFORM) #<unavailable argument> #<unavailable argument> #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">)
  6: ((:METHOD ASDF/ACTION:PERFORM-WITH-RESTARTS :AROUND (T T)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  7: ((:METHOD ASDF/PLAN:PERFORM-PLAN (LIST)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#1# . ..
  8: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
  9: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#..
 10: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
 11: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) #<ASDF/PLAN:SEQUENTIAL-PLAN {1003E29C63}> :VERBOSE NIL) [fast-method]
 12: ((:METHOD ASDF/OPERATE:OPERATE (ASDF/OPERATION:OPERATION ASDF/COMPONENT:COMPONENT)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 13: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL)
 14: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 15: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 16: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL)
 17: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 18: (ASDF/CACHE:CALL-WITH-ASDF-CACHE #<CLOSURE (LAMBDA NIL :IN ASDF/OPERATE:OPERATE) {1003E1B22B}> :OVERRIDE NIL :KEY NIL)
 19: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 20: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 21: (ASDF/OPERATE:LOAD-SYSTEM "cl-cuda" :VERBOSE NIL)
 22: (QUICKLISP-CLIENT::CALL-WITH-MACROEXPAND-PROGRESS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT::APPLY-LOAD-STRATEGY) {1003D8125B}>)
 23: (QUICKLISP-CLIENT::AUTOLOAD-SYSTEM-AND-DEPENDENCIES "cl-cuda" :PROMPT NIL)
 24: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION (T T)) #<unavailable argument> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1004559C2B}>) [fast-method]
 25: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION :AROUND (QL-IMPL:SBCL T)) #<QL-IMPL:SBCL {10066F0833}> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1004559C2B}>) [fast-me..
 26: ((:METHOD QUICKLISP-CLIENT:QUICKLOAD (T)) #<unavailable argument> :PROMPT NIL :SILENT NIL :VERBOSE NIL) [fast-method]
 27: (QL-DIST::CALL-WITH-CONSISTENT-DISTS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT:QUICKLOAD) {100453EAFB}>)
 28: (SB-INT:SIMPLE-EVAL-IN-LEXENV (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA) #<NULL-LEXENV>)
 29: (EVAL (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA))
 30: (SWANK::EVAL-REGION "(ql:quickload :cl-cuda) ..)
 31: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL))
 32: (SWANK-REPL::TRACK-PACKAGE #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {100453E25B}>)
 33: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {100453E1BB}>)
 34: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {100453E19B}>)
 35: (SWANK-REPL::REPL-EVAL "(ql:quickload :cl-cuda) ..)
 36: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 37: (EVAL (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 38: (SWANK:EVAL-FOR-EMACS (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 39: (SWANK::PROCESS-REQUESTS NIL)
 40: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 41: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 42: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-REQUESTS) {1003DD000B}>)
 43: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "c:/Users/jaccarmac/.emacs.d/elpa/slime-20160614.1214/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-R..
 44: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {1003C7EB13}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {1003D8F743}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 45: (SWANK::HANDLE-REQUESTS #<SWANK::MULTITHREADED-CONNECTION {1003220523}> NIL)
 46: ((FLET #:WITHOUT-INTERRUPTS-BODY-1161 :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 47: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 48: ((FLET #:WITHOUT-INTERRUPTS-BODY-359 :IN SB-THREAD::CALL-WITH-MUTEX))
 49: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {9F2FB5B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THREAD "..
 50: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "repl-thread" RUNNING {1003DC8033}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::SPAWN-REPL-THREAD) {1003DBFF9B}> (#<SB-THREAD:THREAD "re..
 51: ("foreign function: #x42E6FC")
 52: ("foreign function: #x40334E")
 53: ("foreign function: #x8B6FE0")
takagi commented 8 years ago

Ah... grovel... I missed you mentioned first with nvcc. While I will think of some working around, how did you failed on MSYS2/MinGW64 at frist?

but apparently the CUDA toolkit and MinGW don't play nice together.

jaccarmac commented 8 years ago

AFAICT (definitely not an expert systems programmer :-), NVIDIA distributes their dev environment as binaries, but provide .libs for MSVC instead of DLL's, which means you have to do low level lib twiddling to get them to link against MinGW's libc.

takagi commented 8 years ago

Is it possible to call nvcuda.dll from SBCL on MinGW?

takagi commented 8 years ago

I suppose that MinGW has a feature to call DLLs as well as GNU libraries, though not familiar with its calling convension.

jaccarmac commented 8 years ago

MinGW does use DLLs as its shared library format, but as I understand it they are linked to an old msvcr.dll. In any case, here are the results from running SBCL from inside a MinGW64 shell.

Subprocess (:PROCESS #<SB-IMPL::PROCESS :EXITED 1>)
 with command ("gcc" "-m64" "-o"
               "C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel-tmpGHU3ALSV.exe"
               "-IC:/Users/jaccarmac/software/quicklisp/dists/quicklisp/software/cffi_0.17.1/"
               "C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel.c")
 exited with error code 1
   [Condition of type CFFI-GROVEL:GROVEL-ERROR]

Restarts:
 0: [RETRY] Retry PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">.
 1: [ACCEPT] Continue, treating PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel"> as having been successful.
 2: [RETRY] Retry ASDF operation.
 3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
 4: [ABORT] Give up on "cl-cuda"
 5: [RETRY] Retry SLIME REPL evaluation request.
 --more--

Backtrace:
  0: (CFFI-GROVEL:GROVEL-ERROR "~a" #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {100614BC93}>)
  1: ((FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE))
  2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE) {9EEDDBB}>)
  3: (CFFI-GROVEL:PROCESS-GROVEL-FILE #P"C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/src/driver-api/type-grovel.lisp" #P"C:/Users/jaccarmac/AppData/Local/cache/common-lisp/sbcl-1.3.6-win-x..
  4: ((:METHOD ASDF/ACTION:PERFORM (CFFI-GROVEL::PROCESS-OP CFFI-GROVEL:GROVEL-FILE)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  5: ((SB-PCL::EMF ASDF/ACTION:PERFORM) #<unavailable argument> #<unavailable argument> #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">)
  6: ((:METHOD ASDF/ACTION:PERFORM-WITH-RESTARTS :AROUND (T T)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  7: ((:METHOD ASDF/PLAN:PERFORM-PLAN (LIST)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#1# . ..
  8: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
  9: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#..
 10: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
 11: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) #<ASDF/PLAN:SEQUENTIAL-PLAN {1003781C63}> :VERBOSE NIL) [fast-method]
 12: ((:METHOD ASDF/OPERATE:OPERATE (ASDF/OPERATION:OPERATION ASDF/COMPONENT:COMPONENT)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 13: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL)
 14: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 15: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 16: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL)
 17: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 18: (ASDF/CACHE:CALL-WITH-ASDF-CACHE #<CLOSURE (LAMBDA NIL :IN ASDF/OPERATE:OPERATE) {100377322B}> :OVERRIDE NIL :KEY NIL)
 19: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 20: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 21: (ASDF/OPERATE:LOAD-SYSTEM "cl-cuda" :VERBOSE NIL)
 22: (QUICKLISP-CLIENT::CALL-WITH-MACROEXPAND-PROGRESS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT::APPLY-LOAD-STRATEGY) {100371125B}>)
 23: (QUICKLISP-CLIENT::AUTOLOAD-SYSTEM-AND-DEPENDENCIES "cl-cuda" :PROMPT NIL)
 24: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION (T T)) #<unavailable argument> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1003DE55FB}>) [fast-method]
 25: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION :AROUND (QL-IMPL:SBCL T)) #<QL-IMPL:SBCL {10066F0833}> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1003DE55FB}>) [fast-me..
 26: ((:METHOD QUICKLISP-CLIENT:QUICKLOAD (T)) #<unavailable argument> :PROMPT NIL :SILENT NIL :VERBOSE NIL) [fast-method]
 27: (QL-DIST::CALL-WITH-CONSISTENT-DISTS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT:QUICKLOAD) {1003DC330B}>)
 28: (SB-INT:SIMPLE-EVAL-IN-LEXENV (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA) #<NULL-LEXENV>)
 29: (EVAL (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA))
 30: (SWANK::EVAL-REGION "(ql:quickload :cl-cuda) ..)
 31: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL))
 32: (SWANK-REPL::TRACK-PACKAGE #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1003DC2A6B}>)
 33: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1003DC29CB}>)
 34: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1003DC29AB}>)
 35: (SWANK-REPL::REPL-EVAL "(ql:quickload :cl-cuda) ..)
 36: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 37: (EVAL (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 38: (SWANK:EVAL-FOR-EMACS (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 39: (SWANK::PROCESS-REQUESTS NIL)
 40: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 41: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 42: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-REQUESTS) {1003DC000B}>)
 43: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "c:/Users/jaccarmac/.emacs.d/elpa/slime-20160614.1214/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-R..
 44: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {1003C76B13}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {1003D87DF3}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 45: (SWANK::HANDLE-REQUESTS #<SWANK::MULTITHREADED-CONNECTION {1003220523}> NIL)
 46: ((FLET #:WITHOUT-INTERRUPTS-BODY-1161 :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 47: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 48: ((FLET #:WITHOUT-INTERRUPTS-BODY-359 :IN SB-THREAD::CALL-WITH-MUTEX))
 49: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {9EEFB5B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THREAD "..
 50: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "repl-thread" RUNNING {1003DB8033}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::SPAWN-REPL-THREAD) {1003DB7F9B}> (#<SB-THREAD:THREAD "re..
 51: ("foreign function: #x42E6FC")
 52: ("foreign function: #x40334E")
 53: ("foreign function: #x2637DA0")
takagi commented 8 years ago

What does gcc return if directly executed? You would be able to find some error messages.

gcc -m64 -o C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\Users\jaccarmac\software\quicklisp\local-projects\cl-cuda\src\driver-api\type-grovel__grovel-tmpGHU3ALSV.exe -IC:/Users/jaccarmac/software/quicklisp/dists/quicklisp/software/cffi_0.17.1/ C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\Users\jaccarmac\software\quicklisp\local-projects\cl-cuda\src\driver-api\type-grovel__grovel.c
jaccarmac commented 8 years ago

Command as written fails because C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\Users\jaccarmac\software\quicklisp\local-projects\cl-cuda\src\driver-api\type-grovel__grovel-tmpGHU3ALSV.exe is not a valid path.

jaccarmac commented 8 years ago

(Note the C\ in the middle of the pathname.)

takagi commented 8 years ago

You do not have the path C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\? Or isn't it because of escape sequences, how about this?

gcc -m64 -o C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel-tmpGHU3ALSV.exe -IC:/Users/jaccarmac/software/quicklisp/dists/quicklisp/software/cffi_0.17.1/ C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel.c
jaccarmac commented 8 years ago

Aha, that was it. cuda.h is missing from gcc's search path.

takagi commented 8 years ago

Please add the path to cuda.h to environment variable C_INCLUDE_PATH. Do you find cuda.h on your environment?

jaccarmac commented 8 years ago

That seems to work! Thanks!

jaccarmac commented 8 years ago

I see you have a list of supported environments in the README. If you let me know how to run the test suite, I can verify it passes and submit a PR with my specifics.

jaccarmac commented 8 years ago

Test programs or (asdf:oos 'asdf:test-op '#:cl-cuda) both fail with an alien function cuInit is undefined.

takagi commented 8 years ago

It helps me a lot. You can run the test with (ql:quickload :cl-cuda-test) .

jaccarmac commented 8 years ago

The alien function "cuInit" is undefined. is what I'm still getting. Natively or from MSYS console, with or without changes to PATH or C_INCLUDE_PATH.

takagi commented 8 years ago

cuInit should be defined via CFFI:DEFCFUN in cl-cuda/src/driver-api/function.lisp. There may be something left to call API in nvcuda.dll. Let me think a while.

takagi commented 8 years ago

Would you try it again with the following fix in cl-cuda/src/driver-api/library.lisp ?

  (cffi:define-foreign-library libcuda
+   (:windows "nvcuda.dll")
+   ; (:windows "nvcuda.dll" :convention :stdcall)
    (:darwin (:framework "CUDA"))
    (:unix (:or "libcuda.so" "libcuda64.so")))

At least, The alien function "cuInit" is undefined. is because of missing a line on Windows in foreign library definition, but I do not know :convention :stdcall is required or not.

jaccarmac commented 8 years ago

Both versions seem to work, and further into the test suite we get The function OSICAT-POSIX:MKTEMP is undefined..

jaccarmac commented 8 years ago

This test also fails further up the chain.

 ? basic case 4
    "float3_add( __make_float3( 1.0f, 1.0f, 1.0f ), __make_float3( 2.0f, 2.0f, 2.0f ) )" is expected to be "float3_add( make_float3( 1.0f, 1.0f, 1.0f ), make_float3( 2.0f, 2.0f, 2.0f ) )"
takagi commented 8 years ago

Both versions seem to work, and further into the test suite we get The function OSICAT-POSIX:MKTEMP is undefined..

OSICAT-POSIX:MKTEMP might not work on Windows, please apply this patch as working around. I use MKTEMP just for making temporary file name. I will fix it later.

cl-cuda/src/api/nvcc.lisp

  (defun get-cu-path ()
+   (let ((name "cl-cuda.tmp"))
-   (let ((name (format nil "cl-cuda.~A" (osicat-posix:mktemp))))
      (make-pathname :name name :type "cu" :defaults (get-tmp-path))))
takagi commented 8 years ago

This is that the test is wrong, I will fix it. You can ignore this.

 ? basic case 4
    "float3_add( __make_float3( 1.0f, 1.0f, 1.0f ), __make_float3( 2.0f, 2.0f, 2.0f ) )" is expected to be "float3_add( make_float3( 1.0f, 1.0f, 1.0f ), make_float3( 2.0f, 2.0f, 2.0f ) )"
jaccarmac commented 8 years ago

All right, now the command nvcc -arch=sm_30 -I C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/include -ptx -o /tmp/cl-cuda.tmp.ptx /tmp/cl-cuda.tmp.cu is failing with nvcc fatal : Cannot find compiler 'cl.exe' in PATH.

It seems like NVCC is designed to run from inside Visual Studio on Windows. I'll see what I can do to fix the path.

jaccarmac commented 8 years ago

Indeed, running SBCL from Visual Studio's CMD makes many of the tests run. Then, nvcc -arch=sm_30 -I C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/include -ptx -o /tmp/cl-cuda.tmp.ptx /tmp/cl-cuda.tmp.cu fails.

cl-cuda.tmp.cu
C:/tmp/cl-cuda.tmp.cu(42): warning: variable "i" was declared but never referenced

C:/tmp/cl-cuda.tmp.cu(46): warning: dynamic initialization in unreachable code

C:/tmp/cl-cuda.tmp.cu(46): warning: variable "i" was declared but never referenced

C:/tmp/cl-cuda.tmp.cu(105): error: expected an expression

1 error detected in the compilation of "C:/Users/JACCAR~1/AppData/Local/Temp/tmpxft_000009cc_00000000-8_cl-cuda.tmp.cpp1.ii".
takagi commented 8 years ago

Can I see the part of failed /tmp/cl-cuda.tmp.cu ? It might be caused by difference in gcc and cl.exe.

jaccarmac commented 8 years ago

https://gist.github.com/jaccarmac/f9e7c0dab71e584f04b8fbb150efd510

takagi commented 8 years ago

It seems that cl.exe does not accept struct initializers as expressions. I will give you a patch, I want to make cl.exe cause no errors in these tests.

takagi commented 8 years ago

Wait a while because I'm at work now.

jaccarmac commented 8 years ago

In that case, I'll take a moment to thank you for the manner in which you're handling the ticket. Really appreciate the promptness of responses and willingness to direct my exploration of the problem :-)!

takagi commented 8 years ago

You also help me a lot to let me know how cl-cuda goes on Windows. Thanks.

takagi commented 8 years ago

Would you try this patch? This disables compiling to compound literals.

cl-cuda/src/lang/built-in.lisp

  ;; built-in vector constructor
- float3 (((float float float) float3 nil "__make_float3"))
- float4 (((float float float float) float4 nil "__make_float4"))
- double3 (((double double double) double3 nil "__make_double3"))
- double4 (((double double double double) double4 nil "__make_double4"))
+ float3 (((float float float) float3 nil "make_float3"))
+ float4 (((float float float float) float4 nil "make_float4"))
+ double3 (((double double double) double3 nil "make_double3"))
+ double4 (((double double double double) double4 nil "make_double4"))

cl-cuda/t/api/defkernel.lisp

  ;;;
  ;;; Initializers
  ;;;

+ #+nil
  (defglobal c (float3 3.0 2.0 1.0))

+ #+nil
  (defkernel initializer (float3 ())
    (let ((x 1.0))
      (return (float3 x 2.0 3.0))))

+ #+nil
  (defkernel use-initializer (void ((x float3*) (y float3*)))
    (set (aref x 0) (initializer))
    (set (aref y 0) c)
    (return))

+ #+nil
  (subtest "Initializers"

    (with-cuda (0)
      (with-memory-blocks ((x 'float3 1)
                           (y 'float3 1))
        (use-initializer x y :grid-dim (list 1 1 1)
                             :block-dim (list 1 1 1))
        (sync-memory-block x :device-to-host)
        (sync-memory-block y :device-to-host)
        (is (memory-block-aref x 0)
          (make-float3 1.0 2.0 3.0)
            :test #'float3-=
            "Ok. - returning with initializer")
        (is (memory-block-aref y 0)
          (make-float3 3.0 2.0 1.0)
            :test #'float3-=
            "Ok. - initializing with initializer"))))
jaccarmac commented 8 years ago

Test suite loads now, but with tons of errors. The colors don't render properly in cmd.exe, so give me a few minutes to figure out how to get the VS2013 dev tools onto a PATH inside a better terminal emulator.

jaccarmac commented 8 years ago

No red tests. There is a ton of grey, however, and several places red times show up. I'm assuming grey is passed tests and red times in ms mean the test ran long, but not familiar enough with the test framework to say.

takagi commented 8 years ago

Thanks. Sounds good. Would you try this sample code? This computes elementwise addition across two arrays. c[i] = a[i] + b[i]

(ql:quickload :cl-cuda)
(load #P"cl-cuda/examples/vector-add.lisp")
(cl-cuda-examples.vector-add:main)
(setf cl-cuda:*show-messages* nil)

If you get the following message, cl-cuda does work on Windows.

verification succeed.
jaccarmac commented 8 years ago

That does indeed show up!

takagi commented 8 years ago

Great! I hope you will enjoy CUDA in Common Lisp.

And can I have your environment?

takagi commented 8 years ago

I will note it on README.

jaccarmac commented 8 years ago
takagi commented 8 years ago

Is this result on MSYS2/MinGW64?

That does indeed show up!
jaccarmac commented 8 years ago

No, that's the trick of this method. You have to load the system in MSYS2/MinGW64 first so that GCC can and do Lisp's groveling work. On the other hand, subsequent loads need to be done from the VS2013 CMD so NVCC has access to MSBuild. In theory you could use the MSYS2 shell and source vcvarsall.bat or something, but I didn't try it.

takagi commented 8 years ago

I got it, thanks.