Closed zzk2021 closed 3 months ago
I don't think so at the moment.
You can change the embeddings API base url to the local one
GRAPHRAG_EMBEDDING_API_BASE
it must still be compatible with OpenAI API schema
It works with ollama embedding by changing the file in /opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py with
from typing_extensions import Unpack
from graphrag.llm.base import BaseLLM from graphrag.llm.types import ( EmbeddingInput, EmbeddingOutput, LLMInput, )
from .openai_configuration import OpenAIConfiguration from .types import OpenAIClientTypes
import ollama
class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): _client: OpenAIClientTypes _configuration: OpenAIConfiguration
def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
self.client = client
self.configuration = configuration
async def _execute_llm(
self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
) -> EmbeddingOutput | None:
args = {
"model": self.configuration.model,
**(kwargs.get("model_parameters") or {}),
}
# embedding = await self.client.embeddings.create(
# input=input,
# **args,
# )
# inputs = input['input']
# print(inputs)
embedding_list = []
for inp in input:
embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
embedding_list.append(embedding["embedding"])
# return [d.embedding for d in embedding.data]
return embedding_list
Thank you for sharing. Pretty brutal fix but if it work then at least this is a stop gap until the Microsoft team implement a more elegant solution.
maybe checking for the presence of an ollama = true in the embedding parameters could allow to keep the default behaviour and only use the hack when true.
My solution was to write a local server with Flask that basically serves as a decoder of "cl100k_base" and a caller of ollama. Then change the api_base for embedding to the local host address. It works pretty well as far as I am concerned.
@zeyunie-vecml, yes, that's precisely what the provided server does in addition to the OAI <-> Ollama translation
UPD: messed up GitHub threads, was in relation to this server
I'm making this thread as our official discussion place for Local Embeddings setup and troubleshooting. Thanks for the curiosity and proactivity!
i think add middleware api for oai
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: mxbai-embed-large
api_base: http://localhost:8686/api
import fastapi
from langchain_community.embeddings.ollama import OllamaEmbeddings
app = fastapi.FastAPI()
@app.post('/api/embeddings')
def embeddings(body:dict):
# print(body)
ollama = OllamaEmbeddings(model=body['model'])
res = ollama.embed_documents(body['input'])
return {
'data': [
{'embedding': rs } for rs in res
]
}
I used ollama embedding with the above modification in embedding function and was able of generating the graph, but I can't query the graph with similar modification to another embed function. This is what i was trying to do:(modify in function _embed_with_retry in embedding.py in query/llm/oai folder)
for attempt in retryer:
with attempt:
embedding = ollama.embeddings(model="nomic-embed-text", prompt=text)
return (embedding["embedding"], len(text))
Kind of wondering what's going wrong here:)
Instead of ollama I am trying llama.cpp for embeddings but I get this error:
11:17:36,84 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=http://localhost:8080 11:17:36,99 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: TPM=0, RPM=0 11:17:36,99 graphrag.index.llm.load_llm INFO create concurrency limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: 1 11:17:36,107 graphrag.index.verbs.text.embed.strategies.openai INFO embedding 177 inputs via 177 snippets using 12 batches. max_batch_size=16, max_tokens=8191 11:17:36,126 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part. Traceback (most recent call last): File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb result = await result File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed return await _text_embed_in_memory( File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory result = await strategy_exec(texts, callbacks, cache, strategy_args) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 61, in run embeddings = await _execute(llm, text_batches, ticker, semaphore) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 105, in _execute results = await asyncio.gather(*futures) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 100, in embed result = np.array(chunk_embeddings.output) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part.
Instead of ollama I am trying llama.cpp for embeddings but I get this error:
11:17:36,84 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=http://localhost:8080 11:17:36,99 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: TPM=0, RPM=0 11:17:36,99 graphrag.index.llm.load_llm INFO create concurrency limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: 1 11:17:36,107 graphrag.index.verbs.text.embed.strategies.openai INFO embedding 177 inputs via 177 snippets using 12 batches. max_batch_size=16, max_tokens=8191 11:17:36,126 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part. Traceback (most recent call last): File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb result = await result File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed return await _text_embed_in_memory( File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory result = await strategy_exec(texts, callbacks, cache, strategy_args) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 61, in run embeddings = await _execute(llm, text_batches, ticker, semaphore) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 105, in _execute results = await asyncio.gather(*futures) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 100, in embed result = np.array(chunk_embeddings.output) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part.
I'm having the same problem. Is this solved or how to avoid it?
@wanglufei1 check out #345 . Embedding with Ollama would work with modification made by user Spacelearner.
@silviachen46 Thank you very much. In my opinion, ollama uses llama.cpp at the bottom layer. I am in a Chinese environment. I tried to switch the model, gave up using nomic-embed-text, and used the qwen model. Now it works properly.
The local search with embeddings from Ollama now works. You can read full guide here: https://medium.com/@karthik.codex/microsofts-graphrag-autogen-ollama-chainlit-fully-local-free-multi-agent-rag-superbot-61ad3759f06f Here is the link to the repo: https://github.com/karthik-codex/autogen_graphRAG
Consolidating Ollama-related issues: https://github.com/microsoft/graphrag/issues/657