srush / GPU-Puzzles

Solve puzzles. Learn CUDA.
MIT License
5.61k stars 334 forks source link

answers #23

Open isamu-isozaki opened 11 months ago

isamu-isozaki commented 11 months ago

Hi! Thanks for the puzzles! There's a Triton reading group in eleuther ai so each of us went through the GPU Puzzles. For this, will you be fine with say a repo of the answers to these puzzles? Happy to send over the notebook via dm!

srush commented 11 months ago

Nice. Yeah I think at this point it is okay to put out answers. Was thinking of making a youtube walkthrough video as well.

Any issues? Happy for feedback or bugs.

isamu-isozaki commented 11 months ago

@srush Not much for bugs but one question for Q13, I did pass the test case with

def axis_sum_test(cuda):
    def call(out, a, size: int) -> None:
        cache = cuda.shared.array(TPB, numba.float32)
        i = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x
        local_i = cuda.threadIdx.x
        if local_i < size:
            batch = cuda.blockIdx.y
            cache[local_i] = a[batch, local_i]
            cuda.syncthreads()
            # FILL ME IN (roughly 12 lines)
            output = 0
            for j in range(size):
                output += cache[j]
            out[batch, 0] = output
    return call

But I felt like you were aiming for a more comprehensive solution(multi-block maybe?). But I wasn't sure. Looking forward to the youtube series!

Anyway, here's my repo! Let me know if you have any suggestions. Happy to update

isamu-isozaki commented 11 months ago

Ok! All solutions are up to date in the repo I think. Added in the README too!