Closed 1444141859 closed 1 month ago
The local search with embeddings from Ollama now works. You can read full guide here: https://medium.com/@karthik.codex/microsofts-graphrag-autogen-ollama-chainlit-fully-local-free-multi-agent-rag-superbot-61ad3759f06f Here is the link to the repo: https://github.com/karthik-codex/autogen_graphRAG
你 端口号是不是错了?
If you want to use open-source models, I've created a repository for deploying Hugging Face models to local endpoints, offering functionality similar to OpenAI APIs. You can find the repo here: https://github.com/rushizirpe/open-llm-server
Also, I've prepared a Colab notebook for the Graphrag Demo. You might want to take a look: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing. If you don't have access to GPUs like the A100, you'll need a GROQ_API_KEY (which is free with certain limitations), you can obtain it from: https://console.groq.com/keys
Consolidating alternate model issues here: https://github.com/microsoft/graphrag/issues/657
Describe the issue
Use vllm to launch a local large model, in the style of openai,but it won't work
Steps to reproduce
step1:python -m vllm.entrypoints.openai.api_server --max-model-len 6144 --gpu-memory-utilization 0.95 --disable-log-stats --served-model-name Qwen2-7B-Instruct --model /mnt/workspace/Qwen2-7B-Instruct step2: start embedding
import os from contextlib import asynccontextmanager from typing import List, Union
import tiktoken import torch import uvicorn from fastapi import FastAPI from fastapi.middleware.cors import CORSMiddleware from pydantic import BaseModel from sentence_transformers import SentenceTransformer from sse_starlette.sse import EventSourceResponse
Set up limit request time
EventSourceResponse.DEFAULT_PING_INTERVAL = 1000
EMBEDDING_PATH = os.environ.get('EMBEDDING_PATH', '/mnt/workspace/m3e-base')
@asynccontextmanager async def lifespan(app: FastAPI): yield if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.ipc_collect()
app = FastAPI(lifespan=lifespan)
app.add_middleware( CORSMiddleware, allow_origins=[""], allow_credentials=True, allow_methods=[""], allow_headers=["*"], )
class CompletionUsage(BaseModel): prompt_tokens: int completion_tokens: int total_tokens: int
class EmbeddingResponse(BaseModel): data: list model: str object: str usage: CompletionUsage
class EmbeddingRequest(BaseModel): input: Union[List[str], str] model: str
@app.post("/v1/embeddings", response_model=EmbeddingResponse) async def get_embeddings(request: EmbeddingRequest): if isinstance(request.input, str): embeddings = [embedding_model.encode(request.input)] else: embeddings = [embedding_model.encode(text) for text in request.input] embeddings = [embedding.tolist() for embedding in embeddings]
if name == "main":
load Embedding
step3:pip install graphrag step4:mkdir -p ./ragtest/input step5:curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./ragtest/input/book.txt step6:python -m graphrag.index --init --root ./ragtest step7: Modify the yml file
GraphRAG Config Used
No response
Logs and screenshots
encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: Qwen2-7B-Instruct model_supports_json: false # recommended if this is available for your model. max_tokens: 2000 request_timeout: 180.0 api_base: http://localhost:8000/v1/
api_version: 2024-02-15-preview
organization:
deployment_name:
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: m3e-base api_base: http://localhost:8001/v1/
api_version: 2024-02-15-preview
Additional Information