Open dokterbob opened 23 hours ago
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on llama.cpp:
llama.cpp
Why is this a separate project to llama.cpp, given that llama.cpp already supports BitNet ternary quants? (https://github.com/ggerganov/llama.cpp/pull/8151)
Are these simply more optimised kernels? If so, how do they compare to llama's implementation? Can/should they be contributed back to llama.cpp?
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on
llama.cpp
:Why is this a separate project to
llama.cpp
, given thatllama.cpp
already supports BitNet ternary quants? (https://github.com/ggerganov/llama.cpp/pull/8151)Are these simply more optimised kernels? If so, how do they compare to llama's implementation? Can/should they be contributed back to
llama.cpp
?