rbitr / ferrite

Simple, lightweight transformers in Fortran
MIT License
16 stars 1 forks source link

Performance #2

Open rbitr opened 11 months ago

rbitr commented 11 months ago

Currently no performance optimizations have been included in the code. I started benchmarking the performance against HF transformers (which I think is the fair comparison for this project, and vs llama.cpp for the llama.f90 code *). Results (in /benchmark) show that on my computers it's slower with linux+openblas and faster with MacOS+accelerate, though Fortran starts up and loads the weights faster. We'll see how that changes with any optimizations.

Current potential changes:

@certik

* There is also this using GGML for embeddings but it's GPU focused https://bloop.ai/blog/gpu_with_ggml

certik commented 11 months ago

You can also get some inspiration here: https://github.com/certik/fastGPT/blob/c2148fbd909c82ec72eaccc00d8ddc51e9106144/gpt2.f90 I optimized quite a few things there. I noticed in https://github.com/rbitr/llama2.f90/blob/0d5b6234f20cd60e5de43655f37b2dfe2d5d1afd/llama2.f90 you do the kv-cache a bit differently than I do. Maybe one way is faster than the other.