Open vaiju1981 opened 4 months ago
I could support some of the quantization types. Is that the main reason vs safetensors?
The main reason is that GGUF are small ( compared to safetensor ) and it makes our testing/usage easier. Apart from that the different quantization.
Currently we are using a different library (deepjavalibrary) to load GGUF model via LlamaEngine. This will make it easier to have that via Jlama support.
Hmm can you give me an example? The GGUF and Safetensor of the same model with same quantization is pretty much the same. Maybe they changed GGUF since I last looked.
So when i meant small size, i am implying downloading from HuggingFace/repos. Downloading quantized model vs safetensors and downloading.
with GGUF, one has vocabulary and other things such as prompt templates are part of same file and don't need to be downloaded separately.
Jlama does the downloading for you. It only needs 4 of the files
If there are models you would like me to quantize and upload please request here https://github.com/tjake/Jlama/discussions/37
Since GGUF is the format used by llama.cpp (which is the most widely used native tool for running models locally), one tends to first rather download the GGUF files in order to quickly test a model with llama-cli.
Having two formats therefore leads to duplication, for example:
$ du -sh Meta-Llama-3.1-8B-Instruct-*
6.1G Meta-Llama-3.1-8B-Instruct-Jlama-Q4
4.3G Meta-Llama-3.1-8B-Instruct-Q4_0.gguf
Apparently someone has already written a GGUF interpreter in Java: llama3.java. Since it supports only llama 3.x it is probably not complete, but may be it could be a starting point?
Hi,
Yeah I saw that and will consider adding it but there's a couple issues.
Since this is a solo project I need to weigh the burden of supporting both. It may make the most sense to switch from safetensor over to GGUF but for me tool support and better distributed inference is higher priority for me.
Some issues with GGUF vs SafeTensor
So overall I concede GGUF support would be cool, but just not ATM.
Thanks for the details! This sounds perfectly reasonable not to prioritise it. Moreover to have SafeTensor support in Java has the added benefit that the original models tend to be in this format.
Is there any plan to support GGUF format directly apart from SafeTensor, that will allow to use this to load other GGUF's. If support already exists can we add it to readme file.