tjake / Jlama

Jlama is a modern LLM inference engine for Java
Apache License 2.0
656 stars 60 forks source link

BF16 Support #38

Closed tjake closed 3 months ago

tjake commented 4 months ago

Many models are using bf16 types. Till now we had been converting to F32 but this doesn't work for larger models since the offsets go beyond 2GB memory offsets - #33.

We should just support BF16 as its very simple to do BF16 -> F32 in SIMD.