Closed semenoffalex closed 2 months ago
.DS_Store is a mac file, not a directory. The query library tries to find the most recent timestamped run - for some reason it is picking up this .DS_Store file instead. We'll take a look at the selection code to ensure it is looking only for folders.
As an immediate fix, you should be able to delete that file (it is just an OS cache of view settings) and re-run.
If you'd like to be more precise each time you run, you can add the --data
param on the CLI and point to exactly the folder of artifacts that you want to query from (e.g., {root}/output/{timestamp}/artifacts).
As an immediate fix, you should be able to delete that file (it is just an OS cache of view settings) and re-run.
Thanks! This time it moved further, but a new error appeared:
INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_chat", 'model': 'mistral', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
Error parsing search response json
Traceback (most recent call last):
File "/opt/miniconda3/envs/graphollama/lib/python3.11/site-packages/graphrag/query/structured_search/global_search/search.py", line 194, in _map_response_single_batch
processed_response = self.parse_search_response(search_response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/graphollama/lib/python3.11/site-packages/graphrag/query/structured_search/global_search/search.py", line 232, in parse_search_response
parsed_elements = json.loads(search_response)["points"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/graphollama/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/graphollama/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/graphollama/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
SUCCESS: Global Search Response: I am sorry but I am unable to answer this question given the provided data.
The command was the same:
python3 -m graphrag.query \
--root ./ragtest \
--method global \
"What are the top themes in this story?"
We've seen a fair bit of reporting on non-OpenAI model JSON formats. 0.2.1 included some improvements to the fallback parsing when a model returns malformed JSON, but it may still have issues that we are unaware of. Unfortunately there's not a lot we can do to help diagnose these since it would be a lot of work to test all the models available. My best recommendation is to search through issues linked to #657 to see what solutions folks have found.
It seems you might be using Ollama to run the Mistral model locally. To help resolve the issue, following setup can be helpful
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat
model: mistral
model_supports_json: true
api_base: http://localhost:11434/v1
parallelization:
stagger: 120
async_mode: threaded
embeddings:
async_mode: threaded
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF
api_base: http://localhost:8001/v1/
concurrent_requests: 2
chunks:
size: 300
overlap: 100
group_by_columns: [id]
input:
type: file
file_type: text
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file
base_dir: "cache"
storage:
type: file
base_dir: "output/${timestamp}/artifacts"
reporting:
type: file
base_dir: "output/${timestamp}/reports"
entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 0
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_report:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false
umap:
enabled: false
snapshots:
graphml: true
raw_entities: true
top_level_nodes: false
nomic-embed-text
model. Clone the following repository and use the ollama_serv.py
script to serve the API for embeddings: git clone https://github.com/9prodhi/EmbedAdapter
cd EmbedAdapter
python ollama_serv.py
By following these steps, you should be able to resolve the JSON parsing issues and get your local setup working correctly with the Mistral model for LLM tasks and the nomic-embed-text model for embeddings.
If you continue to experience problems, please provide more details about the specific error you're encountering, and I'll be happy to assist further.
Original bug fixed with #910, routing this conversation to #657 for Ollama tuning.
Firstly, I get the same problem of "NotADirectoryError: [Errno 20] Not a directory: '/Users/xxxxx/ragtest/output/.DS_Store/artifacts/create_final_nodes.parquet' ,too.
Then, as @natoverse said, DS_store is just a temp file. and the system is looking for a file named create_final_nodes.parquet. If we want to get the file, we use --data.
Third, I get help from cli command, the usage is :" python -m graphrag.query [-h] [--config CONFIG] [--data DATA] [--root ROOT] --method {local,global}[--community_level COMMUNITY_LEVEL] [--response_type RESPONSE_TYPE] query"
Then I use the cli command:""python -m graphrag.query \ --root ./ragtest \ --method global \ --data ./ragtest/output/20240820-232301/artifacts/ \ "What are the top themes in this story?""".
It works!! Attention, the 20240820-232301 is my timeslot. JUST look for the artifacts directory and then you put the path of artifacts to data is fine!!
Do you need to file an issue?
Describe the bug
I managed to follow your example here and got the message: "All workflows completed successfully", even though I saw "Errors occurred during the pipeline run, see logs for more details" message couple of times at the "create_base_entity_graph" step.
However, when I run the first command for interaction with the graph I got (it took my M2 ~ 5 hours to create it for "A Christmas Carol") I got some Python error.
Command I'm trying:
Error I get:
Steps to reproduce
Expected Behavior
I expect to get the answer for the question: "What are the top themes in this story?"
GraphRAG Config Used
Logs and screenshots
Additional Information