Closed sirus20x6 closed 4 months ago
🤖
Hey @sirus20x6, great to see you back! Diving into another challenge, I see? Let's tackle this one together.
Thank you for your feature request. I understand that you're looking for support for gguf in embedding models in LlamaIndex.
Based on the information provided, it seems that the error you're encountering is due to the absence of a file named pytorch_model.bin
, tf_model.h5
, model.ckpt.index
or flax_model.msgpack
in the directory e5-mistral-7b-instruct
. This error is raised by the from_pretrained
method of the AutoModel
class in the transformers
library, which is used by the HuggingFaceEmbedding
class in LlamaIndex.
As for gguf support, there have been similar requests in the past. For instance, this issue requested support for the Hugging Face Inference API or GGUF file type, and this issue asked about GGUF model support. Both issues were closed with pull requests that seem to address the requested features.
However, it's not clear from the current repository documentation whether gguf is fully supported in the latest version of LlamaIndex (v0.9.45.post1). The error message you're seeing suggests that the HuggingFaceEmbedding
class is still expecting one of the file types mentioned above, not a gguf file.
To resolve this issue, you might want to check the following:
e5-mistral-7b-instruct
contains a gguf file.I hope this helps! If you have any more questions or need further clarification, feel free to ask.
HuggingFaceLLM
supporting Hugging Face Inference API or GGUF file typeThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@sirus20x6 I think huggingface technically supports it, I just haven't implemented the code to detect and load gguf files.
If you can point me towards docs for how to load with huggingface, that will probably help motivate me to implement it haha
As a workaround, you can load the model and tokenizer, and pass it in
model = <load model from huggingface>
tokenizer = <load tokenizer from huggingface>
embed_model = HuggingFaceEmbedding(model=model, tokenizer=tokenizer, ..)
so I have to upload the model to hugging face and then use that format? oh wait. I'm not sure what to even use for tokenizer. is that just the tokenizer.json from the original fp16?
You tell me hahaha
I think it probably is that tokenizer yea.
Not saying you have to upload the model to huggingface, just that it has to be loaded with huggingface modules in order to work with HuggingFaceLLM
Really though, for gguf, you should be using ollama or llama.cpp (both of which we have support for)
how do I use llama.ccp in llamaindex for embeddings?
@sirus20x6 you can't at the moment (and tbh I really wouldn't recommend it either -- using an LLM for embeddings is pretty subpar, using an actual model trained for embeddings is ideal (I.e. bge, etc.)
according to the embedding leaderboard https://huggingface.co/spaces/mteb/leaderboard
SFR-Embedding-Mistral is the highest ranking embedding model. the gguf is found here https://huggingface.co/dranger003/SFR-Embedding-Mistral-GGUF
If you want to contribute an embedding class to support this, I encourage you to :)
(Imo, maybe a spicy take, a 14GB embedding model that barely beats a 1GB embedding model doesn't feel very worth it)
well I spent good money on 512GB of ram for a reason
Hi, @sirus20x6,
I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue was raised by you to request support for gguf in embedding models in LlamaIndex. It seems that the lack of support for gguf was addressed with a workaround where users can load the model and tokenizer from HuggingFace and pass it into the HuggingFaceEmbedding class. There was also a discussion about using llama.cpp for embeddings, but it was discouraged in favor of using actual models trained for embeddings.
Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!
Feature Description
it appears that gguf isn't supported for embedding models
Reason
I don't know enough about llamaindex to answer what's stopping the feature from working.
Value of Feature
gguf is smaller, faster, and all in one file that has almost no perplexity loss vs fp16