microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
18.89k stars 1.85k forks source link

[Bug]: Unable to local query using latest main branch with error "FileNotFoundError: Table entity_description_embeddings does not exist." #828

Closed GuityOrange closed 3 months ago

GuityOrange commented 3 months ago

Do you need to file an issue?

Describe the bug

I've been using GraphRAG for two weeks now, and I updated to the latest branch in order to solve the issue that LLMs returns faulty responses on non JSON mode. However, I'm unable to local query correctly now. Considering the impact of past operations, I even cloned a new copy of the code and started building from scratch, but I still encountered the same issue. poetry run poe index and global query works fine, but I get an error when running local query. The error :

poetry run poe query --root . --method local "What is the service track?"

Poe => python -m graphrag.query --root . --method local 'What is the service track?'

INFO: Reading settings from settings.yaml
INFO: Vector Store Args: {}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/littleKitty/PycharmProjects/graphrag2/graphrag/query/__main__.py", line 83, in <module>
    run_local_search(
  File "/Users/littleKitty/PycharmProjects/graphrag2/graphrag/query/cli.py", line 162, in run_local_search
    description_embedding_store = __get_embedding_description_store(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/littleKitty/PycharmProjects/graphrag2/graphrag/query/cli.py", line 75, in __get_embedding_description_store
    description_embedding_store.db_connection.open_table(
  File "/Users/littleKitty/Library/Caches/pypoetry/virtualenvs/graphrag-kyfzN3S0-py3.11/lib/python3.11/site-packages/lancedb/db.py", line 445, in open_table
    return LanceTable.open(self, name, index_cache_size=index_cache_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/littleKitty/Library/Caches/pypoetry/virtualenvs/graphrag-kyfzN3S0-py3.11/lib/python3.11/site-packages/lancedb/table.py", line 937, in open
    raise FileNotFoundError(
FileNotFoundError: Table entity_description_embeddings does not exist. Please first call db.create_table(entity_description_embeddings, data)

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: gpt-4o-2024-05-13
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 4000
  # request_timeout: 180.0
  # api_base: https://<instance>.openai.azure.com
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-ada-002
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

Additional Information

thanks for help

ksachdeva commented 3 months ago

facing the same issue!

natoverse commented 3 months ago

Duplicate of #813. We're investigating.

jasonkylelol commented 3 months ago

Facing the same problem...

Poe => python -m graphrag.query --root /workspace/dev/ragtest/ --method local 'who is Mr. Peanut Butter?'

INFO: Reading settings from /workspace/dev/ragtest/settings.yaml

INFO: Vector Store Args: {} Traceback (most recent call last): File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/workspace/graphrag/graphrag/query/main.py", line 83, in run_local_search( File "/workspace/graphrag/graphrag/query/cli.py", line 162, in run_local_search description_embedding_store = __get_embedding_description_store( File "/workspace/graphrag/graphrag/query/cli.py", line 75, in __get_embedding_description_store description_embedding_store.db_connection.open_table( File "/root/.cache/pypoetry/virtualenvs/graphrag-vJy-M-6u-py3.10/lib/python3.10/site-packages/lancedb/db.py", line 445, in open_table return LanceTable.open(self, name, index_cache_size=index_cache_size) File "/root/.cache/pypoetry/virtualenvs/graphrag-vJy-M-6u-py3.10/lib/python3.10/site-packages/lancedb/table.py", line 937, in open raise FileNotFoundError( FileNotFoundError: Table entity_description_embeddings does not exist.Please first call db.create_table(entity_description_embeddings, data)

meltingrock commented 3 months ago

Same issue

goutou7474 commented 3 months ago

Same issue

liuzhiconglzc commented 3 months ago

Same issue

zijinyuan commented 3 months ago

In first time run local search, must let "config_args.get("overwrite", False) = True", and dump embeddings from the entities list to the description_embedding_store. That is ok for me.

Minxiangliu commented 3 months ago

In first time run local search, must let "config_args.get("overwrite", False) = True", and dump embeddings from the entities list to the description_embedding_store. That is ok for me.

Hello @zijinyuan , I am new to using graphrag. Could you please clarify where to set config_args.get("overwrite", False) = True and description_embedding_store?

Thanks in advance.

zijinyuan commented 2 months ago

In first time run local search, must let "config_args.get("overwrite", False) = True", and dump embeddings from the entities list to the description_embedding_store. That is ok for me.

Hello @zijinyuan , I am new to using graphrag. Could you please clarify where to set config_args.get("overwrite", False) = True and description_embedding_store? Thanks in advance.

open file:query/cli.py,and then find code line:if config_args.get("overwrite", False). when u firstly use "local_search",u should change "if config_args.get("overwrite", False)" to be "if True",and after that ,change it back.it will be ok.

yes, [XiaoTongDeng] is right,

travisgu commented 4 weeks ago

In first time run local search, must let "config_args.get("overwrite", False) = True", and dump embeddings from the entities list to the description_embedding_store. That is ok for me.

Hello @zijinyuan , I am new to using graphrag. Could you please clarify where to set config_args.get("overwrite", False) = True and description_embedding_store? Thanks in advance.

open file:query/cli.py,and then find code line:if config_args.get("overwrite", False). when u firstly use "local_search",u should change "if config_args.get("overwrite", False)" to be "if True",and after that ,change it back.it will be ok.

yes, [XiaoTongDeng] is right,

How to set this parameter when running through the "python -m graphrag.query" command line? Thanks