Closed dx111ge closed 2 months ago
Embeddings are not working with Ollama... I was able to get things working with Ollama for the entities and openai for embeddings.
Working config can be found here: https://github.com/microsoft/graphrag/issues/339#issuecomment-2206149531
Ollama works as expected
GRAPHRAG_API_KEY=123
GRAPHRAG_API_BASE=http://172.17.0.1:11434/v1
# GRAPHRAG_LLM_MODEL=llama3:instruct
GRAPHRAG_LLM_MODEL=codestral
GRAPHRAG_LLM_THREAD_COUNT=4
GRAPHRAG_LLM_CONCURRENT_REQUESTS=8
GRAPHRAG_LLM_MAX_TOKENS=2048
GRAPHRAG_EMBEDDING_API_BASE=http://172.17.0.1:11435/v1
GRAPHRAG_EMBEDDING_MODEL=mxbai-embed-large
:11435
is a dead-simple proxy that converts HTTP requests from OAI to Ollama format
### OAI ```typescript JSON.stringify({ object: "list", data: [ ...results.map((r, i) => ({ object: "embedding", index: i, embedding: r.embedding, })), ], model, usage: { prompt_tokens: 0, total_tokens: 0, }, }) ``` ### Ollama ```typescript JSON.stringify({ model, prompt: input, }) ```
Ollama works as expected
GRAPHRAG_API_KEY=123 GRAPHRAG_API_BASE=http://172.17.0.1:11434/v1 # GRAPHRAG_LLM_MODEL=llama3:instruct GRAPHRAG_LLM_MODEL=codestral GRAPHRAG_LLM_THREAD_COUNT=4 GRAPHRAG_LLM_CONCURRENT_REQUESTS=8 GRAPHRAG_LLM_MAX_TOKENS=2048 GRAPHRAG_EMBEDDING_API_BASE=http://172.17.0.1:11435/v1 GRAPHRAG_EMBEDDING_MODEL=mxbai-embed-large
:11435
is a dead-simple proxy that converts HTTP requests from OAI to Ollama formatAPI shapes
OAI
JSON.stringify({ object: "list", data: [ ...results.map((r, i) => ({ object: "embedding", index: i, embedding: r.embedding, })), ], model, usage: { prompt_tokens: 0, total_tokens: 0, }, })
Ollama
JSON.stringify({ model, prompt: input, })
Sorry for what might be obvious... but how do you run this proxy? When I run ollama serve it only listen on the default port and not on 11435
What do you use to run this proxy?
@bmaltais, no worries!
11435 is a proxy server written in JS/Node to specifically map request/response between OAI and Ollama formats, I didn't list the whole code as it's pretty much from the Node docs
@bmaltais, no worries!
11435 is a proxy server written in JS/Node to specifically map request/response between OAI and Ollama formats, I didn't list the whole code as it's pretty much from the Node docs
This is what I was afraid of ;-) I guess I will wait for something to be built by someone. I don't understand enough about node.js to build this.
Ollama works as expected
GRAPHRAG_API_KEY=123 GRAPHRAG_API_BASE=http://172.17.0.1:11434/v1 # GRAPHRAG_LLM_MODEL=llama3:instruct GRAPHRAG_LLM_MODEL=codestral GRAPHRAG_LLM_THREAD_COUNT=4 GRAPHRAG_LLM_CONCURRENT_REQUESTS=8 GRAPHRAG_LLM_MAX_TOKENS=2048 GRAPHRAG_EMBEDDING_API_BASE=http://172.17.0.1:11435/v1 GRAPHRAG_EMBEDDING_MODEL=mxbai-embed-large
:11435
is a dead-simple proxy that converts HTTP requests from OAI to Ollama formatAPI shapes
Can you please explain how did you do this..for embeddings api..
It works with ollama embedding by changing the file in /opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py with
from typing_extensions import Unpack
from graphrag.llm.base import BaseLLM from graphrag.llm.types import ( EmbeddingInput, EmbeddingOutput, LLMInput, )
from .openai_configuration import OpenAIConfiguration from .types import OpenAIClientTypes
import ollama
class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): _client: OpenAIClientTypes _configuration: OpenAIConfiguration
def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
self.client = client
self.configuration = configuration
async def _execute_llm(
self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
) -> EmbeddingOutput | None:
args = {
"model": self.configuration.model,
**(kwargs.get("model_parameters") or {}),
}
# embedding = await self.client.embeddings.create(
# input=input,
# **args,
# )
# inputs = input['input']
# print(inputs)
embedding_list = []
for inp in input:
embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
embedding_list.append(embedding["embedding"])
# return [d.embedding for d in embedding.data]
return embedding_list
It works with ollama embedding by changing the file in /opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py with
from typing_extensions import Unpack
from graphrag.llm.base import BaseLLM from graphrag.llm.types import ( EmbeddingInput, EmbeddingOutput, LLMInput, )
from .openai_configuration import OpenAIConfiguration from .types import OpenAIClientTypes
import ollama
class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): _client: OpenAIClientTypes _configuration: OpenAIConfiguration
def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration): self.client = client self.configuration = configuration async def _execute_llm( self, input: EmbeddingInput, **kwargs: Unpack[LLMInput] ) -> EmbeddingOutput | None: args = { "model": self.configuration.model, **(kwargs.get("model_parameters") or {}), } # embedding = await self.client.embeddings.create( # input=input, # **args, # ) # inputs = input['input'] # print(inputs) embedding_list = [] for inp in input: embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp) embedding_list.append(embedding["embedding"]) # return [d.embedding for d in embedding.data] return embedding_list
Can you please provide the complete /opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py replacement code and also the settings file.
@SpaceLearner Does it work when you try to query? I adapted your code to work with langchain, it create the embeddings... but when I try to do a local query I get an error.
This is my embeddings version:
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
"""The EmbeddingsLLM class."""
from typing_extensions import Unpack
from graphrag.llm.base import BaseLLM
from graphrag.llm.types import (
EmbeddingInput,
EmbeddingOutput,
LLMInput,
)
from .openai_configuration import OpenAIConfiguration
from .types import OpenAIClientTypes
from langchain_community.embeddings import OllamaEmbeddings
class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]):
"""A text-embedding generator LLM."""
_client: OpenAIClientTypes
_configuration: OpenAIConfiguration
def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
self.client = client
self.configuration = configuration
async def _execute_llm(
self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
) -> EmbeddingOutput | None:
args = {
"model": self.configuration.model,
**(kwargs.get("model_parameters") or {}),
}
# embedding = await self.client.embeddings.create(
# input=input,
# **args,
# )
# return [d.embedding for d in embedding.data]
ollama_emb = OllamaEmbeddings(**args)
embedding_list = []
for inp in input:
embedding = ollama_emb.embed_documents([inp])
# embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
embedding_list.append(embedding[0])
return embedding_list
This the error:
Error embedding chunk {'OpenAIEmbedding': "'NoneType' object is not iterable"}
Traceback (most recent call last):
File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\__main__.py", line 75, in <module>
run_local_search(
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\cli.py", line 154, in run_local_search
result = search_engine.search(query=query)
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search
context_text, context_records = self.context_builder.build_context(
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context
selected_entities = map_query_to_entities(
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities
search_results = text_embedding_vectorstore.similarity_search_by_text(
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text
query_embedding = text_embedder(text)
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda>
text_embedder=lambda t: text_embedder.embed(t),
File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
File "H:\llm_stuff\graphrag\venv\lib\site-packages\numpy\lib\function_base.py", line 550, in average
raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized
I suspect the query embeddings code also need to be modified...
@SpaceLearner Does it work when you try to query? I adapted your code to work with langchain, it create the embeddings... but when I try to do a local query I get an error.
This is my embeddings version:
# Copyright (c) 2024 Microsoft Corporation. # Licensed under the MIT License """The EmbeddingsLLM class.""" from typing_extensions import Unpack from graphrag.llm.base import BaseLLM from graphrag.llm.types import ( EmbeddingInput, EmbeddingOutput, LLMInput, ) from .openai_configuration import OpenAIConfiguration from .types import OpenAIClientTypes from langchain_community.embeddings import OllamaEmbeddings class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): """A text-embedding generator LLM.""" _client: OpenAIClientTypes _configuration: OpenAIConfiguration def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration): self.client = client self.configuration = configuration async def _execute_llm( self, input: EmbeddingInput, **kwargs: Unpack[LLMInput] ) -> EmbeddingOutput | None: args = { "model": self.configuration.model, **(kwargs.get("model_parameters") or {}), } # embedding = await self.client.embeddings.create( # input=input, # **args, # ) # return [d.embedding for d in embedding.data] ollama_emb = OllamaEmbeddings(**args) embedding_list = [] for inp in input: embedding = ollama_emb.embed_documents([inp]) # embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp) embedding_list.append(embedding[0]) return embedding_list
This the error:
Error embedding chunk {'OpenAIEmbedding': "'NoneType' object is not iterable"} Traceback (most recent call last): File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\__main__.py", line 75, in <module> run_local_search( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\cli.py", line 154, in run_local_search result = search_engine.search(query=query) File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search context_text, context_records = self.context_builder.build_context( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context selected_entities = map_query_to_entities( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities search_results = text_embedding_vectorstore.similarity_search_by_text( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text query_embedding = text_embedder(text) File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda> text_embedder=lambda t: text_embedder.embed(t), File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) File "H:\llm_stuff\graphrag\venv\lib\site-packages\numpy\lib\function_base.py", line 550, in average raise ZeroDivisionError( ZeroDivisionError: Weights sum to zero, can't be normalized
I suspect the query embeddings code also need to be modified...
hack the file C:\Users\user-name\miniconda3\Lib\site-packages\graphrag\query\llm\oai\embedding.py
with the fellowing contents(tips: only fix--method local
param, the --method global
still error😅):
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
"""OpenAI Embedding model implementation."""
import asyncio
from collections.abc import Callable
from typing import Any
import numpy as np
import tiktoken
from tenacity import (
AsyncRetrying,
RetryError,
Retrying,
retry_if_exception_type,
stop_after_attempt,
wait_exponential_jitter,
)
from graphrag.query.llm.base import BaseTextEmbedding
from graphrag.query.llm.oai.base import OpenAILLMImpl
from graphrag.query.llm.oai.typing import (
OPENAI_RETRY_ERROR_TYPES,
OpenaiApiType,
)
from graphrag.query.llm.text_utils import chunk_text
from graphrag.query.progress import StatusReporter
from langchain_community.embeddings import OllamaEmbeddings
class OpenAIEmbedding(BaseTextEmbedding, OpenAILLMImpl):
"""Wrapper for OpenAI Embedding models."""
def __init__(
self,
api_key: str | None = None,
azure_ad_token_provider: Callable | None = None,
model: str = "text-embedding-3-small",
deployment_name: str | None = None,
api_base: str | None = None,
api_version: str | None = None,
api_type: OpenaiApiType = OpenaiApiType.OpenAI,
organization: str | None = None,
encoding_name: str = "cl100k_base",
max_tokens: int = 8191,
max_retries: int = 10,
request_timeout: float = 180.0,
retry_error_types: tuple[type[BaseException]] = OPENAI_RETRY_ERROR_TYPES, # type: ignore
reporter: StatusReporter | None = None,
):
OpenAILLMImpl.__init__(
self=self,
api_key=api_key,
azure_ad_token_provider=azure_ad_token_provider,
deployment_name=deployment_name,
api_base=api_base,
api_version=api_version,
api_type=api_type, # type: ignore
organization=organization,
max_retries=max_retries,
request_timeout=request_timeout,
reporter=reporter,
)
self.model = model
self.encoding_name = encoding_name
self.max_tokens = max_tokens
self.token_encoder = tiktoken.get_encoding(self.encoding_name)
self.retry_error_types = retry_error_types
def embed(self, text: str, **kwargs: Any) -> list[float]:
"""
Embed text using OpenAI Embedding's sync function.
For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
Please refer to: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
"""
token_chunks = chunk_text(
text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
)
chunk_embeddings = []
chunk_lens = []
for chunk in token_chunks:
try:
embedding, chunk_len = self._embed_with_retry(chunk, **kwargs)
chunk_embeddings.append(embedding)
chunk_lens.append(chunk_len)
# TODO: catch a more specific exception
except Exception as e: # noqa BLE001
self._reporter.error(
message="Error embedding chunk",
details={self.__class__.__name__: str(e)},
)
continue
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
return chunk_embeddings.tolist()
async def aembed(self, text: str, **kwargs: Any) -> list[float]:
"""
Embed text using OpenAI Embedding's async function.
For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
"""
token_chunks = chunk_text(
text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
)
chunk_embeddings = []
chunk_lens = []
embedding_results = await asyncio.gather(*[
self._aembed_with_retry(chunk, **kwargs) for chunk in token_chunks
])
embedding_results = [result for result in embedding_results if result[0]]
chunk_embeddings = [result[0] for result in embedding_results]
chunk_lens = [result[1] for result in embedding_results]
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) # type: ignore
chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
return chunk_embeddings.tolist()
def _embed_with_retry(
self, text: str | tuple, **kwargs: Any
) -> tuple[list[float], int]:
try:
retryer = Retrying(
stop=stop_after_attempt(self.max_retries),
wait=wait_exponential_jitter(max=10),
reraise=True,
retry=retry_if_exception_type(self.retry_error_types),
)
for attempt in retryer:
with attempt:
embedding = (
OllamaEmbeddings(
model=self.model,
).embed_query(text)
or []
)
return (embedding, len(text))
except RetryError as e:
self._reporter.error(
message="Error at embed_with_retry()",
details={self.__class__.__name__: str(e)},
)
return ([], 0)
else:
# TODO: why not just throw in this case?
return ([], 0)
async def _aembed_with_retry(
self, text: str | tuple, **kwargs: Any
) -> tuple[list[float], int]:
try:
retryer = AsyncRetrying(
stop=stop_after_attempt(self.max_retries),
wait=wait_exponential_jitter(max=10),
reraise=True,
retry=retry_if_exception_type(self.retry_error_types),
)
async for attempt in retryer:
with attempt:
embedding = (
await OllamaEmbeddings(
model=self.model,
).embed_query(text) or [] )
return (embedding, len(text))
except RetryError as e:
self._reporter.error(
message="Error at embed_with_retry()",
details={self.__class__.__name__: str(e)},
)
return ([], 0)
else:
# TODO: why not just throw in this case?
return ([], 0)
It seems I have it working now. It returns nothings if I set llm to llama3, but works ok when switching to mistral.
Is text or csv the only formats supported? Does it support pdf?
To change the openai request format to the one supported by ollama, setting only requires the base_url parameter, for example, api_base: http://localhost:8000/v1
from http.server import BaseHTTPRequestHandler, HTTPServer
import json
from socketserver import ThreadingMixIn
from urllib.parse import urlparse, parse_qs
from queue import Queue
import requests
import argparse
from ascii_colors import ASCIIColors
# Directly defining server configurations
servers = [
("server1", {'url': 'http://localhost:11434', 'queue': Queue()}),
# Add more servers if needed
]
# Define the Ollama model to use
ollama_model = 'qwen2:7b'
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--port', type=int, default=8000, help='Port number for the server')
args = parser.parse_args()
ASCIIColors.red("Ollama Proxy server")
class RequestHandler(BaseHTTPRequestHandler):
def _send_response(self, response):
self.send_response(response.status_code)
for key, value in response.headers.items():
if key.lower() not in ['content-length', 'transfer-encoding', 'content-encoding']:
self.send_header(key, value)
self.send_header('Transfer-Encoding', 'chunked')
self.end_headers()
try:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
self.wfile.write(b"%X\r\n%s\r\n" % (len(chunk), chunk))
self.wfile.flush()
self.wfile.write(b"0\r\n\r\n")
except BrokenPipeError:
pass
def do_GET(self):
self.log_request()
self.proxy()
def do_POST(self):
self.log_request()
self.proxy()
def proxy(self):
url = urlparse(self.path)
path = url.path
get_params = parse_qs(url.query) or {}
if self.command == "POST":
content_length = int(self.headers['Content-Length'])
post_data = self.rfile.read(content_length)
post_data_str = post_data.decode('utf-8')
try:
post_params = json.loads(post_data_str)
except json.JSONDecodeError:
post_params = {}
post_params['model'] = ollama_model
post_params = json.dumps(post_params).encode('utf-8')
else:
post_params = {}
# Find the server with the lowest number of queue entries.
min_queued_server = servers[0]
for server in servers:
cs = server[1]
if cs['queue'].qsize() < min_queued_server[1]['queue'].qsize():
min_queued_server = server
if path == '/api/generate' or path == '/api/chat':
que = min_queued_server[1]['queue']
que.put_nowait(1)
try:
post_data_dict = {}
if isinstance(post_data, bytes):
post_data_str = post_data.decode('utf-8')
post_data_dict = json.loads(post_data_str)
response = requests.request(self.command, min_queued_server[1]['url'] + path, params=get_params,
data=post_params, stream=post_data_dict.get("stream", False))
self._send_response(response)
except Exception:
pass
finally:
que.get_nowait()
else:
# For other endpoints, just mirror the request.
response = requests.request(self.command, min_queued_server[1]['url'] + path, params=get_params,
data=post_params)
self._send_response(response)
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
pass
print('Starting server')
server = ThreadedHTTPServer(('', args.port), RequestHandler) # Set the entry port here.
print(f'Running server on port {args.port}')
server.serve_forever()
if __name__ == "__main__":
main()
@gdhua, your prompt-fu failed you, this proxy server doesn't transform embeddings API between OAI/Ollama formats.
@bmaltais, here's the final version of the proxy I ended up using. There was another issue with the fact GraphRAG sends raw token IDs into the embeddings API, rather than non-tokenised raw text.
```python import os import sys import json import logging import asyncio from aiohttp import web import aiohttp import tiktoken logging.basicConfig(stream=sys.stdout, level=logging.INFO) config = { "proxy_port": int(os.environ.get("PROXY_PORT", 11435)), "api_url": os.environ.get("OLLAMA_ENDPOINT"), "tiktoken_encoding": "cl100k_base" } encoding = tiktoken.get_encoding(config["tiktoken_encoding"]) async def handle_embeddings(request): try: body = await request.json() model = body["model"] input_data = body["input"] print(f"/v1/embeddings handler {str(input_data)[:100]}") if isinstance(input_data, str): input_data = [input_data] results = await asyncio.gather(*[fetch_embeddings(model, i) for i in input_data]) response_data = { "object": "list", "data": [ { "object": "embedding", "index": i, "embedding": r["embedding"] } for i, r in enumerate(results) ], "model": model, "usage": { "prompt_tokens": 0, "total_tokens": 0 } } return web.json_response(response_data) except Exception as e: print(f"Error: {str(e)}") return web.Response(status=500) async def fetch_embeddings(model, input_text): if isinstance(input_text, int): input_text = encoding.decode([input_text]) # If array of ints - decode the logits with tiktoken if isinstance(input_text, list): input_text = encoding.decode(input_text) if not isinstance(input_text, str): raise ValueError(f"Input is not a string: {input_text}") async with aiohttp.ClientSession() as session: async with session.post( f"{config['api_url']}/api/embeddings", headers={"Content-Type": "application/json"}, json={"model": model, "prompt": input_text} ) as response: text = await response.text() json_data = json.loads(text) print(f"Embeddings: {input_text[:50]}... -> {text[:50]}...") return json_data def main(): print('Starting embeddings proxy...') if not config["api_url"]: raise ValueError("OLLAMA_ENDPOINT environment variable is required") app = web.Application() app.router.add_post("/v1/embeddings", handle_embeddings) web.run_app(app, port=config["proxy_port"], host="0.0.0.0") if __name__ == "__main__": main() ```
A few caveats:
gemma2
own embeddings (which are of course an order of magnitude slower)@xiaoquisme , errors when using --method global
occurs on my situation as well, and my observation was that the response of llama3 is not aligned such that even the system prompt requires it to answer in json but it includes some filler sentences in the beginning/end of its response. A fix could be in line 233 of .../site-packages/graphrag/query/structured_search/global_search/search.py
add this as the first line of the function:
search_response = search_response[max(0, search_response.find("{")):min(len(search_response), search_response.rfind("}") + 1)]
which in most of the time removes the filler sentences.
However, a disclaimer is that my llama3 sometimes even forgets to (where gpt rarely does) answer in the structure of json at all for queries like "Can you give me a joke for people read about this". I think this may only be fixed by improving the prompts or using a more "obedient" model.
this worked for me.
I'm making this thread as our official discussion place for Ollama setup and troubleshooting. Thanks for the engagement and support, what an amazing community!
this is a temp hacked solution for ollama https://github.com/s106916/graphrag
@SpaceLearner Does it work when you try to query? I adapted your code to work with langchain, it create the embeddings... but when I try to do a local query I get an error. This is my embeddings version:
# Copyright (c) 2024 Microsoft Corporation. # Licensed under the MIT License """The EmbeddingsLLM class.""" from typing_extensions import Unpack from graphrag.llm.base import BaseLLM from graphrag.llm.types import ( EmbeddingInput, EmbeddingOutput, LLMInput, ) from .openai_configuration import OpenAIConfiguration from .types import OpenAIClientTypes from langchain_community.embeddings import OllamaEmbeddings class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): """A text-embedding generator LLM.""" _client: OpenAIClientTypes _configuration: OpenAIConfiguration def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration): self.client = client self.configuration = configuration async def _execute_llm( self, input: EmbeddingInput, **kwargs: Unpack[LLMInput] ) -> EmbeddingOutput | None: args = { "model": self.configuration.model, **(kwargs.get("model_parameters") or {}), } # embedding = await self.client.embeddings.create( # input=input, # **args, # ) # return [d.embedding for d in embedding.data] ollama_emb = OllamaEmbeddings(**args) embedding_list = [] for inp in input: embedding = ollama_emb.embed_documents([inp]) # embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp) embedding_list.append(embedding[0]) return embedding_list
This the error:
Error embedding chunk {'OpenAIEmbedding': "'NoneType' object is not iterable"} Traceback (most recent call last): File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\__main__.py", line 75, in <module> run_local_search( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\cli.py", line 154, in run_local_search result = search_engine.search(query=query) File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search context_text, context_records = self.context_builder.build_context( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context selected_entities = map_query_to_entities( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities search_results = text_embedding_vectorstore.similarity_search_by_text( File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text query_embedding = text_embedder(text) File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda> text_embedder=lambda t: text_embedder.embed(t), File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) File "H:\llm_stuff\graphrag\venv\lib\site-packages\numpy\lib\function_base.py", line 550, in average raise ZeroDivisionError( ZeroDivisionError: Weights sum to zero, can't be normalized
I suspect the query embeddings code also need to be modified...
hack the file
C:\Users\user-name\miniconda3\Lib\site-packages\graphrag\query\llm\oai\embedding.py
with the fellowing contents(tips: only fix
--method local
param, the--method global
still error😅):# Copyright (c) 2024 Microsoft Corporation. # Licensed under the MIT License """OpenAI Embedding model implementation.""" import asyncio from collections.abc import Callable from typing import Any import numpy as np import tiktoken from tenacity import ( AsyncRetrying, RetryError, Retrying, retry_if_exception_type, stop_after_attempt, wait_exponential_jitter, ) from graphrag.query.llm.base import BaseTextEmbedding from graphrag.query.llm.oai.base import OpenAILLMImpl from graphrag.query.llm.oai.typing import ( OPENAI_RETRY_ERROR_TYPES, OpenaiApiType, ) from graphrag.query.llm.text_utils import chunk_text from graphrag.query.progress import StatusReporter from langchain_community.embeddings import OllamaEmbeddings class OpenAIEmbedding(BaseTextEmbedding, OpenAILLMImpl): """Wrapper for OpenAI Embedding models.""" def __init__( self, api_key: str | None = None, azure_ad_token_provider: Callable | None = None, model: str = "text-embedding-3-small", deployment_name: str | None = None, api_base: str | None = None, api_version: str | None = None, api_type: OpenaiApiType = OpenaiApiType.OpenAI, organization: str | None = None, encoding_name: str = "cl100k_base", max_tokens: int = 8191, max_retries: int = 10, request_timeout: float = 180.0, retry_error_types: tuple[type[BaseException]] = OPENAI_RETRY_ERROR_TYPES, # type: ignore reporter: StatusReporter | None = None, ): OpenAILLMImpl.__init__( self=self, api_key=api_key, azure_ad_token_provider=azure_ad_token_provider, deployment_name=deployment_name, api_base=api_base, api_version=api_version, api_type=api_type, # type: ignore organization=organization, max_retries=max_retries, request_timeout=request_timeout, reporter=reporter, ) self.model = model self.encoding_name = encoding_name self.max_tokens = max_tokens self.token_encoder = tiktoken.get_encoding(self.encoding_name) self.retry_error_types = retry_error_types def embed(self, text: str, **kwargs: Any) -> list[float]: """ Embed text using OpenAI Embedding's sync function. For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average. Please refer to: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb """ token_chunks = chunk_text( text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens ) chunk_embeddings = [] chunk_lens = [] for chunk in token_chunks: try: embedding, chunk_len = self._embed_with_retry(chunk, **kwargs) chunk_embeddings.append(embedding) chunk_lens.append(chunk_len) # TODO: catch a more specific exception except Exception as e: # noqa BLE001 self._reporter.error( message="Error embedding chunk", details={self.__class__.__name__: str(e)}, ) continue chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings) return chunk_embeddings.tolist() async def aembed(self, text: str, **kwargs: Any) -> list[float]: """ Embed text using OpenAI Embedding's async function. For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average. """ token_chunks = chunk_text( text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens ) chunk_embeddings = [] chunk_lens = [] embedding_results = await asyncio.gather(*[ self._aembed_with_retry(chunk, **kwargs) for chunk in token_chunks ]) embedding_results = [result for result in embedding_results if result[0]] chunk_embeddings = [result[0] for result in embedding_results] chunk_lens = [result[1] for result in embedding_results] chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) # type: ignore chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings) return chunk_embeddings.tolist() def _embed_with_retry( self, text: str | tuple, **kwargs: Any ) -> tuple[list[float], int]: try: retryer = Retrying( stop=stop_after_attempt(self.max_retries), wait=wait_exponential_jitter(max=10), reraise=True, retry=retry_if_exception_type(self.retry_error_types), ) for attempt in retryer: with attempt: embedding = ( OllamaEmbeddings( model=self.model, ).embed_query(text) or [] ) return (embedding, len(text)) except RetryError as e: self._reporter.error( message="Error at embed_with_retry()", details={self.__class__.__name__: str(e)}, ) return ([], 0) else: # TODO: why not just throw in this case? return ([], 0) async def _aembed_with_retry( self, text: str | tuple, **kwargs: Any ) -> tuple[list[float], int]: try: retryer = AsyncRetrying( stop=stop_after_attempt(self.max_retries), wait=wait_exponential_jitter(max=10), reraise=True, retry=retry_if_exception_type(self.retry_error_types), ) async for attempt in retryer: with attempt: embedding = ( await OllamaEmbeddings( model=self.model, ).embed_query(text) or [] ) return (embedding, len(text)) except RetryError as e: self._reporter.error( message="Error at embed_with_retry()", details={self.__class__.__name__: str(e)}, ) return ([], 0) else: # TODO: why not just throw in this case? return ([], 0)
Thanks. For anyone who don't use langchain and just want to use ollama's embedding model, you can make these changes and it will work for global query answering:
And yes, when doing local query there will still be an error concerning another function in this same .py file.
I was able to get GraphRAG + Ollama up and running. However indexing took several hours. Any ideas how to speed up indexing?
E.g. does it make sense to edit the parallelisation section in the yaml file somehow? What is the default num_threads since it is commented out in the initially created file?
PS: Not sure if this is the right place to ask this, but the title says '[GraphRAG Community Support for running Ollama]'
Thanks for any idea / comment!
Unfortunately how graphrag work at the moment is very very GPU demanding. I think this is what will keep it from being used by users on local computer.
graphrag take an hour to process one book that take 10 seconds to process on my custom RAG system.
And my custom system is using a query augmentation strategy using RAG answers to the original question to produce a better question along with important keywords list back as the augmented question. This ensure better embedding matching across the whole document and produce better answer than graphrag (and even notebooklm) most of the time…
I can share the code if you are interested.
That is why I am looking for ways to speed it up. Anyone with ideas that go beyond the default settings?
Consolidating Ollama-related issues: https://github.com/microsoft/graphrag/issues/657
Unfortunately how graphrag work at the moment is very very GPU demanding. I think this is what will keep it from being used by users on local computer.
graphrag take an hour to process one book that take 10 seconds to process on my custom RAG system.
And my custom system is using a query augmentation strategy using RAG answers to the original question to produce a better question along with important keywords list back as the augmented question. This ensure better embedding matching across the whole document and produce better answer than graphrag (and even notebooklm) most of the time…
I can share the code if you are interested.
Your process sounds interesting care to share it?
Your process sounds interesting care to share it?
Been a while since I touched it. Was working pretty well. Let me see if I can push the code to github so you can have a look at it.
@Tipik1n Here is the repo: https://github.com/bmaltais/AIResearcher
Quick how to use:
Clone the repository
Navigate to the cloned repository directory:
cd <cloned-repo-name>
Create a new virtual environment:
python -m venv venv
./venv/Scripts/activate
Install required packages:
pip install -r requirements.txt
Obtain an API key from Google: https://aistudio.google.com/app/apikey
Copy .env.sample to .env
Put the API key in the file
Put the material you want to query into the documents
folder.
Import the documents: python importall.py
Query the documents: python research-main
To enhance the question type improve
Type the question
Here is the sample output of a question on "The Project Gutenberg eBook of A Room with a View":
(venv) PS H:\llm_stuff\AIResearcher> python .\importall.py
processing pg2641.txt
Chunking text
Inserting chunks into DB
Processed pg2641.txt
processing pg74070.txt
Chunking text
Inserting chunks into DB
Processed pg74070.txt
All documents have been processed and inserted into ChromaDB.
(venv) PS H:\llm_stuff\AIResearcher> python .\research-main.py
What now? ('quit' to exit, 'forget' to clear history, 'improve' to enable improvement): improve
Question improvement enabled.
What now? ('quit' to exit, 'forget' to clear history, 'no improve' to disable improvement): Provide me with more details about A Room With A View, such as the plot, c
haracters, themes, historical context, etc, etc
Improving question with Gemini...
Improved question: Can you provide a comprehensive analysis of E.M. Forster's novel "A Room With A View," encompassing its plot, character development, prominent themes, and the historical context in which it was written? Please format your response as an academic essay.
E.M. Forster, A Room With A View, novel, plot, characters, themes, historical context, Edwardian era, societal expectations, love, class, societal restrictions, Italy, England, Florence, Lucy Honeychurch, George Emerson, Charlotte Bartlett, Cecil Vyse, Mr. Emerson, social conventions, desire, beauty, experience, freedom, repression, transformation, symbolism, imagery.
Querying Gemini...
Gemini's answer:
## A Room with a View: A Study in Societal Repression and Personal Liberation
E.M. Forster's *A Room with a View* (1908), a poignant and satirical novel set against the backdrop of the Edwardian era, delves into the complexities of love, class, and societal expectations. The story revolves around Lucy Honeychurch, a young Englishwoman on a journey of self-discovery in both the idyllic landscapes of Italy and the suffocating confines of English society. Through Lucy's experiences, Forster masterfully exposes the hypocrisy and absurdity of societal conventions while celebrating the transformative power of passion, beauty, and personal freedom.
The novel unfolds with Lucy's arrival in Florence, accompanied by her chaperone, the well-meaning but rigidly conventional Charlotte Bartlett. Dissatisfied with their "room with a view," Lucy encounters the Emersons – a father and son who defy societal norms with their intellectualism, emotional openness, and unconventional views. George Emerson, the younger Emerson, becomes captivated by Lucy's spirit and challenges the carefully constructed walls she has erected around herself. Their passionate encounter amidst the vibrant fields of Fiesole awakens within Lucy a dormant desire for authentic experience and emotional liberation, a stark contrast to the stifling decorum expected of her in England.
Back in England, Lucy becomes engaged to Cecil Vyse, a man embodying the epitome of upper-class refinement and intellectual snobbery. While Cecil admires Lucy for her perceived beauty and social suitability, he fails to see her true self, further emphasizing the suffocating nature of societal expectations that prioritize appearances over genuine connection. The reappearance of the Emersons in Lucy's life reignites the suppressed desires within her, forcing her to confront the hypocrisy of her impending marriage and the societal pressures dictating her choices.
One of the most prominent themes in *A Room with a View* is the conflict between societal expectations and individual desires. Forster masterfully portrays the Edwardian era's rigid social hierarchy, particularly for women. Lucy's life is governed by a strict code of conduct that dictates her behaviour, her choice of suitor, and ultimately, her future. Her initial attraction to Cecil stems partly from his societal approval, representing a safe, predictable path within the confines of social norms. However, George's passionate nature and disregard for convention awaken within her a yearning for genuine connection and emotional fulfillment, a desire deemed improper and even dangerous by society's standards.
This conflict is further highlighted by the contrasting settings of Italy and England. Italy, particularly Florence, symbolizes freedom, passion, and a rejection of societal constraints. It is in the warmth of the Italian sun and the beauty of its art and landscapes that Lucy experiences moments of genuine connection and self-discovery. In contrast, England, with its emphasis on social decorum and rigid class structure, represents repression and the stifling of individual desires. The return to England signifies a return to societal pressures and the expectation that Lucy will conform to predetermined roles.
The characters in *A Room with a View* are masterfully crafted to embody the novel's central themes. Lucy's journey is one of self-discovery and a struggle for agency. She grapples with societal expectations, ultimately choosing to embrace her desires and forge her path, even if it means challenging convention. George Emerson serves as a catalyst for Lucy's transformation, representing passion, authenticity, and a rejection of societal norms. Cecil, on the other hand, embodies the stifling nature of societal expectations, prioritizing appearances and intellectual superiority over genuine human connection. Charlotte Bartlett, with her rigid adherence to social propriety, highlights the absurdity and ultimately, the harmful nature of blindly following societal dictates.
Through vivid imagery and symbolism, Forster further emphasizes his critique of societal hypocrisy and celebration of personal liberation. The recurring motif of "views" serves as a metaphor for the characters' perspectives and their ability to see beyond societal constructs. The "room with a view" itself becomes symbolic of the potential for expanded horizons and a wider perspective. Similarly, the use of natural imagery, particularly the vibrant landscapes of Italy, reinforces the themes of freedom, passion, and authentic experience, contrasting sharply with the constricted atmosphere of Edwardian England.
In conclusion, *A Room with a View* remains a timeless masterpiece, not only for its compelling narrative and engaging characters but also for its enduring relevance in a world still grappling with the tension between societal expectations and individual desires. Forster's insightful exploration of love, class, and the human need for connection continues to resonate with readers, urging us to question societal norms and embrace the transformative power of authenticity and personal freedom.
@bmaltais
@Tipik1n Here is the repo: https://github.com/bmaltais/AIResearcher
i think this is a discussion w/ graphrag + ollama...?? please provide the instructions on how to run your prj via ollama 🙏🏻
@bmaltais
@Tipik1n Here is the repo: https://github.com/bmaltais/AIResearcher
i think this is a discussion w/ graphrag + ollama...?? please provide the instructions on how to run your prj via ollama 🙏🏻
Hi,
Unfortunatly it is not leveraging GraphRAG. I was just providing a link to a custom RAG solution that perform pretty well. As much as I like GraphRAG, it is too resource demanding for the added benefits.
is there a working example for using Ollama? Or is it not supposed to work? Did try, but without any success.
Thanks in advance