Open youngjae2424 opened 1 month ago
Hey there, @youngjae2424! 👋 I'm here to help you out while you wait for a human maintainer. I'm a bot that can assist with solving bugs, answering questions, and even becoming a contributor.
To resolve the error "ValueError: The model_name
argument must be provided" and successfully create local embeddings without using Hugging Face, you can use the resolve_embed_model
function from the LlamaIndex library. This function allows you to specify different embedding models, including local ones. Here is an example of how to do this:
from llama_index.legacy.embeddings.utils import resolve_embed_model
# Set up the embedding model with the required model to use with llamaindex core.
embed_model = resolve_embed_model(embed_model="local:BAAI/bge-small-en")
# Or if you want to embed some text separately
embeddings = embed_model.get_text_embedding("I want to embed this text!")
In your case, you need to modify the ServiceContext
creation to use a specific local embedding model. Here is how you can adjust your code:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.legacy.embeddings.utils import resolve_embed_model
llm = LlamaCPP(
model_path="models/llama-3-Korean-Bllossom-8B.Q2_K.gguf",
)
documents = SimpleDirectoryReader("./data/").load_data()
# Set up the embedding model
embed_model = resolve_embed_model(embed_model="local:BAAI/bge-small-en")
# Create the ServiceContext with the custom embedding model
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
# Query your data
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
This should resolve the error and allow you to use local embeddings without relying on Hugging Face [1].
To continue talking to Dosu, mention @dosu.
I have executed your code and it seems to be trying to connect to the following link: https://huggingface.co/. Is there any way to do this without using HuggingFace embeddings?
OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like BAAI/bge-small-en is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Yes, you can create local embeddings without using Hugging Face embeddings and without requiring internet access. LlamaIndex provides an easy-to-extend base class for implementing your own embeddings. Here is an example of how you can create a custom embedding model:
from typing import Any, List
from llama_index.core.base.embeddings.base import BaseEmbedding
class CustomLocalEmbedding(BaseEmbedding):
def __init__(self, **kwargs: Any) -> None:
super().__init__(**kwargs)
# Initialize your local embedding model here
def _get_query_embedding(self, query: str) -> List[float]:
# Implement your logic to generate embeddings for the query
return [0.0] * 768 # Example: return a dummy embedding
def _get_text_embedding(self, text: str) -> List[float]:
# Implement your logic to generate embeddings for the text
return [0.0] * 768 # Example: return a dummy embedding
def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
# Implement your logic to generate embeddings for a list of texts
return [[0.0] * 768 for _ in texts] # Example: return dummy embeddings
async def _get_query_embedding(self, query: str) -> List[float]:
return self._get_query_embedding(query)
async def _get_text_embedding(self, text: str) -> List[float]:
return self._get_text_embedding(text)
You can then set this custom embedding model in the Settings
object:
from llama_index.core import Settings
Settings.embed_model = CustomLocalEmbedding()
Additionally, you can use pre-trained local models by resolving them with the resolve_embed_model
function. Here is an example of how to create local embeddings using pre-trained models:
from llama_index.embeddings import resolve_embed_model
def create_local_embedding(model_name: str) -> BaseEmbedding:
model = resolve_embed_model(f"local:{model_name}")
return model
# Example usage
local_embedding_model = create_local_embedding("sentence-transformers/all-MiniLM-L6-v2")
For more detailed guidance, you can refer to the Custom Embeddings guide provided by LlamaIndex. This way, you can use your local embedding model without requiring internet access or connecting to Hugging Face [1][2][3].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
I don't use huggingface because of proxy issue. So, I tried to make local embedding as follows
But, I got the error as follows. How can it be resolved?