takagi / cl-cuda

Cl-cuda is a library to use NVIDIA CUDA in Common Lisp programs.
MIT License
280 stars 24 forks source link

Atomic Operations #100

Closed digikar99 closed 3 years ago

digikar99 commented 3 years ago

Is there a way to use atomic operations?

I can see the symbol CL-CUDA.LANG.BUILT-IN:ATOMIC-ADD, however when I compile/call a kernel using that symbol:

The function CL-CUDA.LANG.BUILT-IN:ATOMIC-ADD is undefined.
   [Condition of type SIMPLE-ERROR]

Restarts:
 0: [RETRY] Retry SLIME REPL evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {104F978103}>)

Backtrace:
  0: (CL-CUDA.LANG.BUILT-IN::INFERRED-FUNCTION CL-CUDA.LANG.BUILT-IN:ATOMIC-ADD (CL-CUDA.LANG.TYPE:INT CL-CUDA.LANG.TYPE:INT))
  1: (CL-CUDA.LANG.BUILT-IN:BUILT-IN-FUNCTION-INFIX-P CL-CUDA.LANG.BUILT-IN:ATOMIC-ADD (CL-CUDA.LANG.TYPE:INT CL-CUDA.LANG.TYPE:INT))
  2: (CL-CUDA.LANG.COMPILER.COMPILE-EXPRESSION::COMPILE-BUILT-IN-FUNCTION (CL-CUDA.LANG.BUILT-IN:ATOMIC-ADD DC::NUM-CALLS 1) ((DC::I . #S(CL-CUDA.LANG.ENVIRONMENT::VARIABLE :NAME DC::I :TYPE CL-CUDA.LANG.T..

Is there some other way to do it, or is this functionality yet to be implemented?

takagi commented 3 years ago

I can see a test for ATOMIC-ADD here: https://github.com/takagi/cl-cuda/blob/8aaf319303cca78cd999fc5defb2793e6fb76a18/t/api/defkernel.lisp#L150

Would you check around it?

digikar99 commented 3 years ago

Well, yes, the test passes. Thanks for the pointer!

Turned out that a "wrong compilation" - for example, compiling atomic-add with float arguments / unsupported arguments - changes the function signature, and then results in the function-is-undefined error. For instance,

With

(defkernel test-atomic-add (void ((x int*)))
  (atomic-add (pointer (aref x 0)) 1))

The following passes:

(with-cuda (0)
  (with-memory-blocks ((x 'int 1))
    (setf (memory-block-aref x 0) 0)
    (sync-memory-block x :host-to-device)
    (is (test-atomic-add x :grid-dim '(1 1 1) :block-dim '(256 1 1))
        nil "basic case 13")
    (sync-memory-block x :device-to-host)))

But, if I do something wrong:

(defkernel test-atomic-add (void ((x int*)))
  (atomic-add (aref x 0) 1)) 

The above test fails.

It re-passes on a correct recompilation.