microsoft / BitNet

Official inference framework for 1-bit LLMs
MIT License
2.59k stars 179 forks source link

Relationship to llama.cpp #10

Open dokterbob opened 23 hours ago

dokterbob commented 23 hours ago

First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.

Considering that this is using GGML and seems based directly on llama.cpp:

Why is this a separate project to llama.cpp, given that llama.cpp already supports BitNet ternary quants? (https://github.com/ggerganov/llama.cpp/pull/8151)

Are these simply more optimised kernels? If so, how do they compare to llama's implementation? Can/should they be contributed back to llama.cpp?