microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
17.37k stars 1.66k forks source link

[Feature Request]: Local LLM and embedding #524

Closed CoderJackZhu closed 2 months ago

CoderJackZhu commented 2 months ago

Is your feature request related to a problem? Please describe.

Currently, local models and local embeddings are not supported. When will they be supported?

Describe the solution you'd like

I hope to use LLM locally instead of use GPT-4. It is costly and can't afford. I'd appreciate if it is optional to LLM locally.

Additional context

No response

KylinMountain commented 2 months ago

read this https://github.com/microsoft/graphrag/issues/374 or read my 微信公众号实战微软新一代RAG:GraphRAG强大的全局理解能力,碾压朴素RAG?

JackyYangPassion commented 2 months ago

can use deepseek directly in CN?

KylinMountain commented 2 months ago

can use deepseek directly in CN?

if it is compatible with OPEN AI SDK, it should be ok. Like Qwen, moonshot and groq, they are compatible with OAI SDK.

win4r commented 2 months ago

看这里 https://youtu.be/XiLEZzm7yCk

btcmonte commented 2 months ago

Tested this repo and it works well.

https://github.com/TheAiSingularity/graphrag-local-ollama

*note - I'm doing research on large medical docs, but the default max token was too large, and I had to reduce the max token size to 4000 in embedding section of settings.yaml: embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: nomic_embed_text api_base: http://localhost:11434/api

api_version: 2024-02-15-preview

# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 1
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
batch_size: 2 # the number of documents to send in a single request
**batch_max_tokens: 4000** # the maximum number of tokens to send in a single request
# target: required # or optional
karthik-codex commented 2 months ago

The local search with embeddings from Ollama now works. You can read full guide here: https://medium.com/@karthik.codex/microsofts-graphrag-autogen-ollama-chainlit-fully-local-free-multi-agent-rag-superbot-61ad3759f06f Here is the link to the repo: https://github.com/karthik-codex/autogen_graphRAG

CoderJackZhu commented 2 months ago

https://github.com/severian42/GraphRAG-Ollama-UI,this repo solve problem