microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
15.98k stars 1.48k forks source link

[Local Embeddings] Community Support thread #370

Closed zzk2021 closed 1 month ago

zzk2021 commented 1 month ago

image

vv5d commented 1 month ago

I don't think so at the moment.

av commented 1 month ago

You can change the embeddings API base url to the local one

GRAPHRAG_EMBEDDING_API_BASE

it must still be compatible with OpenAI API schema

SpaceLearner commented 1 month ago

It works with ollama embedding by changing the file in /opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py with

from typing_extensions import Unpack

from graphrag.llm.base import BaseLLM from graphrag.llm.types import ( EmbeddingInput, EmbeddingOutput, LLMInput, )

from .openai_configuration import OpenAIConfiguration from .types import OpenAIClientTypes

import ollama

class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): _client: OpenAIClientTypes _configuration: OpenAIConfiguration

def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
    self.client = client
    self.configuration = configuration
async def _execute_llm(
    self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
) -> EmbeddingOutput | None:
    args = {
        "model": self.configuration.model,
        **(kwargs.get("model_parameters") or {}),
    }
    # embedding = await self.client.embeddings.create(
    #     input=input,
    #     **args,
    # )
    # inputs = input['input']
    # print(inputs)
    embedding_list = []
    for inp in input:
        embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
        embedding_list.append(embedding["embedding"])
    # return [d.embedding for d in embedding.data]
    return embedding_list
bmaltais commented 1 month ago

Thank you for sharing. Pretty brutal fix but if it work then at least this is a stop gap until the Microsoft team implement a more elegant solution.

maybe checking for the presence of an ollama = true in the embedding parameters could allow to keep the default behaviour and only use the hack when true.

zeyunie-vecml commented 1 month ago

My solution was to write a local server with Flask that basically serves as a decoder of "cl100k_base" and a caller of ollama. Then change the api_base for embedding to the local host address. It works pretty well as far as I am concerned.

av commented 1 month ago

@zeyunie-vecml, yes, that's precisely what the provided server does in addition to the OAI <-> Ollama translation

UPD: messed up GitHub threads, was in relation to this server

AlonsoGuevara commented 1 month ago

I'm making this thread as our official discussion place for Local Embeddings setup and troubleshooting. Thanks for the curiosity and proactivity!

chenli90s commented 1 month ago

i think add middleware api for oai

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: mxbai-embed-large
    api_base: http://localhost:8686/api
import fastapi
from langchain_community.embeddings.ollama import OllamaEmbeddings
app = fastapi.FastAPI()
@app.post('/api/embeddings')
def embeddings(body:dict):
    # print(body)
    ollama = OllamaEmbeddings(model=body['model'])
    res = ollama.embed_documents(body['input'])
    return {
        'data': [
            {'embedding': rs } for rs in res
        ]
    }
silviachen46 commented 1 month ago

I used ollama embedding with the above modification in embedding function and was able of generating the graph, but I can't query the graph with similar modification to another embed function. This is what i was trying to do:(modify in function _embed_with_retry in embedding.py in query/llm/oai folder)

for attempt in retryer:

            with attempt:

                embedding = ollama.embeddings(model="nomic-embed-text", prompt=text) 

                return (embedding["embedding"], len(text))

Kind of wondering what's going wrong here:)

goodmaney commented 1 month ago

https://github.com/microsoft/graphrag/issues/451.

automateyournetwork commented 1 month ago

Instead of ollama I am trying llama.cpp for embeddings but I get this error:

11:17:36,84 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=http://localhost:8080 11:17:36,99 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: TPM=0, RPM=0 11:17:36,99 graphrag.index.llm.load_llm INFO create concurrency limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: 1 11:17:36,107 graphrag.index.verbs.text.embed.strategies.openai INFO embedding 177 inputs via 177 snippets using 12 batches. max_batch_size=16, max_tokens=8191 11:17:36,126 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part. Traceback (most recent call last): File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb result = await result File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed return await _text_embed_in_memory( File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory result = await strategy_exec(texts, callbacks, cache, strategy_args) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 61, in run embeddings = await _execute(llm, text_batches, ticker, semaphore) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 105, in _execute results = await asyncio.gather(*futures) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 100, in embed result = np.array(chunk_embeddings.output) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part.

wanglufei1 commented 1 month ago

Instead of ollama I am trying llama.cpp for embeddings but I get this error:

11:17:36,84 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=http://localhost:8080 11:17:36,99 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: TPM=0, RPM=0 11:17:36,99 graphrag.index.llm.load_llm INFO create concurrency limiter for nomic-embed-text-v1.5.Q5_K_M.gguf: 1 11:17:36,107 graphrag.index.verbs.text.embed.strategies.openai INFO embedding 177 inputs via 177 snippets using 12 batches. max_batch_size=16, max_tokens=8191 11:17:36,126 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part. Traceback (most recent call last): File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb result = await result File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed return await _text_embed_in_memory( File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory result = await strategy_exec(texts, callbacks, cache, strategy_args) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 61, in run embeddings = await _execute(llm, text_batches, ticker, semaphore) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 105, in _execute results = await asyncio.gather(*futures) File "/home/fragb0x/packet_graphRAG/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 100, in embed result = np.array(chunk_embeddings.output) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (16,) + inhomogeneous part.

I'm having the same problem. Is this solved or how to avoid it?

silviachen46 commented 1 month ago

@wanglufei1 check out #345 . Embedding with Ollama would work with modification made by user Spacelearner.

wanglufei1 commented 1 month ago

@silviachen46 Thank you very much. In my opinion, ollama uses llama.cpp at the bottom layer. I am in a Chinese environment. I tried to switch the model, gave up using nomic-embed-text, and used the qwen model. Now it works properly.

karthik-codex commented 1 month ago

The local search with embeddings from Ollama now works. You can read full guide here: https://medium.com/@karthik.codex/microsofts-graphrag-autogen-ollama-chainlit-fully-local-free-multi-agent-rag-superbot-61ad3759f06f Here is the link to the repo: https://github.com/karthik-codex/autogen_graphRAG

natoverse commented 1 month ago

Consolidating Ollama-related issues: https://github.com/microsoft/graphrag/issues/657