likejazz / llama3.cuda

llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
MIT License
294 stars 20 forks source link

Use CUDA event API for benchmarking #6

Open meneraing opened 1 month ago

meneraing commented 1 month ago

Modified the code by following the blog post shared in #4

On a RTX 4080 SUPER:

image

And using the maximum amount of tokens:

image