llama2.c is a very simple implementation to run inference of models with a Llama2-like transformer-based LLM architecture.
This is a pure C# implementation of the same thing. It is optimized for speed and very simple to understand and modify.
Requires .net7 or higher.
dotnet build -c Release
.\bin\Release\net7.0\llama2.cs.exe stories15M.bin
.\bin\Release\net7.0\llama2.cs.exe stories15M.bin -i "A long time ago a"