[Bug]: Using Ollama and error occur like:[JSONDecodeError: Expecting ',' delimiter: line 5 column 45]

Describe the bug

Try to use local model(include qwen2:1.5b and phi3) by ollama instead of GPT4
Command:python -m graphrag.index --root ./ragtest
When executing create_final_entities part,errors occurred.
Here is the screenshot:
Here is the screenshot when I use GPT4 in a almost same environment(Only the settings.ymal is different),and everything looks fine.

Steps to reproduce

download ollama:
curl -fsSL https://ollama.com/install.sh | sh
set .env and settings.ymal down here
Do like the doc

Expected Behavior

pipline should work well .

GraphRAG Config Used

settings.yaml config is like that:

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ollama
  type: openai_chat # or azure_openai_chat
  model: qwen2:1.5b
  model_supports_json: True
  api_base: http://localhost:8004/v1  #this port is my local config

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: ollama
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text:latest
    api_base: http://localhost:8004/api     #this port is my local config

The .env config is like:

GRAPHRAG_API_KEY=ollama

Logs and screenshots

log file:

16:27:45,459 datashaper.workflow.workflow INFO executing verb create_community_reports
16:27:49,439 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
16:27:49,485 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
16:27:49,486 graphrag.llm.openai.utils ERROR error loading json, json={
    "title": "Harmony Assembly Community",
    "summary": "Harmony Assembly is a community that operates in Verdant Oasis Plaza. The organization, Harmony Assembly, is responsible for organizing the Unity March at Verdant Oasis Plaza. This event has significant implications and could pose threats to the community if not managed properly.",
    "rating": 7,
    "rating_explanation": "Harmony Assembly's actions have a high impact on the community, as they are involved in organizing an event that could attract media attention and potentially influence public perception.",
    "findings": [
        {
            "summary": "Harmony Assembly is the organizer of the Unity March at Verdant Oasis Plaza",
            "explanation": "Harmony Assembly is responsible for organizing the Unity March, which is a significant event in the community. The march has the potential to attract media attention and influence public perception."
        },
        {
            "summary": "Harmony Assembly's actions could pose threats if not managed properly",
            "explanation": "The actions of Harmony Assembly could pose threats to the community if they are not managed properly, as their involvement in organizing an event at Verdant Oasis Plaza could attract media attention and potentially influence public perception."
        }
    ]
}
Traceback (most recent call last):
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/utils.py", line 94, in try_parse_json_object
    result = json.loads(clean_json)
  File "/root/miniconda3/envs/grag/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/root/miniconda3/envs/grag/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/root/miniconda3/envs/grag/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 5 column 45 (char 405)
16:27:49,486 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py", line 58, in __call__
    await self._llm(
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/json_parsing_llm.py", line 34, in __call__
    result = await self._delegate(input, **kwargs)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/openai_token_replacing_llm.py", line 37, in __call__
    return await self._delegate(input, **kwargs)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/openai_history_tracking_llm.py", line 33, in __call__
    output = await self._delegate(input, **kwargs)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/base/caching_llm.py", line 104, in __call__
    result = await self._delegate(input, **kwargs)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in __call__
    result, start = await execute_with_retry()
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry
    async for attempt in retryer:
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
    do = await self.iter(retry_state=self._retry_state)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/root/miniconda3/envs/grag/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/root/miniconda3/envs/grag/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry
    return await do_attempt(), start
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt
    return await self._delegate(input, **kwargs)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 48, in __call__
    return await self._invoke_json(input, **kwargs)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 92, in _invoke_json
    result = await generate()
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 84, in generate
    await self._native_json(input, **{**kwargs, "name": call_name})
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 123, in _native_json
    json_output = try_parse_json_object(raw_output)
  File "/root/miniconda3/envs/grag/lib/python3.10/site-packages/graphrag/llm/openai/utils.py", line 94, in try_parse_json_object
    result = json.loads(clean_json)
  File "/root/miniconda3/envs/grag/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/root/miniconda3/envs/grag/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/root/miniconda3/envs/grag/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 5 column 45 (char 405)
16:27:49,487 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
16:27:49,487 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 2

However,when I copy this JSON string to test, the format seems correct.

And I have try some solution,like manually change json format and change the format of prompt.It do help when the output is not standard JOSN string.But my output now looks ok, but there are still errors.

What's more,I try different models like qwen2:1.5b and phi3.But they all are small size model.Does this mean GraphRAG don't support these small model?

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

when I use llama3(80k input), I have similar error information in global search part, when I use qwen2:7b(320k input), It solved.

While the local search still can not work: ZeroDivisionError: Weights sum to zero, can't be normalized

@yurochang I think you mean 8k input for llama3. Are you also using ollama for embeddings? I was able to run local query without error when I modified the code in graphrag\query\llm\oai\embedding.py as the following (need to pip install ollama first), but it yields completely out of context results.

I also tried other solutions in the github issues, either it's giving out of context results or the same error. And I printed the context_text after the context building (before the local search happens), it was kind of related to the input, but it wasn't including anything related to my question.

I'm thinking it might be due to those errors when creating the community reports and our models were too small. Today, ollama supports the newest llama 3.1 model with 128k context window, I'm going to give it a try.

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""OpenAI Embedding model implementation."""

import asyncio
from collections.abc import Callable
from typing import Any

import numpy as np
import tiktoken
from tenacity import (
    AsyncRetrying,
    RetryError,
    Retrying,
    retry_if_exception_type,
    stop_after_attempt,
    wait_exponential_jitter,
)

from graphrag.query.llm.base import BaseTextEmbedding
from graphrag.query.llm.oai.base import OpenAILLMImpl
from graphrag.query.llm.oai.typing import (
    OPENAI_RETRY_ERROR_TYPES,
    OpenaiApiType,
)
from graphrag.query.llm.text_utils import chunk_text
from graphrag.query.progress import StatusReporter

import ollama
import json

class OpenAIEmbedding(BaseTextEmbedding, OpenAILLMImpl):
    """Wrapper for OpenAI Embedding models."""

    def __init__(
        self,
        api_key: str | None = None,
        azure_ad_token_provider: Callable | None = None,
        model: str = "text-embedding-3-small",
        deployment_name: str | None = None,
        api_base: str | None = None,
        api_version: str | None = None,
        api_type: OpenaiApiType = OpenaiApiType.OpenAI,
        organization: str | None = None,
        encoding_name: str = "cl100k_base",
        max_tokens: int = 8191,
        max_retries: int = 10,
        request_timeout: float = 180.0,
        retry_error_types: tuple[type[BaseException]] = OPENAI_RETRY_ERROR_TYPES,  # type: ignore
        reporter: StatusReporter | None = None,
    ):
        OpenAILLMImpl.__init__(
            self=self,
            api_key=api_key,
            azure_ad_token_provider=azure_ad_token_provider,
            deployment_name=deployment_name,
            api_base=api_base,
            api_version=api_version,
            api_type=api_type,  # type: ignore
            organization=organization,
            max_retries=max_retries,
            request_timeout=request_timeout,
            reporter=reporter,
        )

        self.model = model
        self.encoding_name = encoding_name
        self.max_tokens = max_tokens
        self.token_encoder = tiktoken.get_encoding(self.encoding_name)
        self.retry_error_types = retry_error_types

    def embed(self, text: str, **kwargs: Any) -> list[float]:
        """
        Embed text using OpenAI Embedding's sync function.

        For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
        Please refer to: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
        """
        token_chunks = chunk_text(
            text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
        )
        chunk_embeddings = []
        chunk_lens = []
        for chunk in token_chunks:
            try:
                embedding, chunk_len = self._embed_with_retry(chunk, **kwargs)
                chunk_embeddings.append(embedding)
                chunk_lens.append(chunk_len)
            # TODO: catch a more specific exception
            except Exception as e:  # noqa BLE001
                self._reporter.error(
                    message="Error embedding chunk",
                    details={self.__class__.__name__: str(e)},
                )

                continue
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
        return chunk_embeddings.tolist()

    async def aembed(self, text: str, **kwargs: Any) -> list[float]:
        """
        Embed text using OpenAI Embedding's async function.

        For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
        """
        token_chunks = chunk_text(
            text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
        )
        chunk_embeddings = []
        chunk_lens = []
        embedding_results = await asyncio.gather(*[
            self._aembed_with_retry(chunk, **kwargs) for chunk in token_chunks
        ])
        embedding_results = [result for result in embedding_results if result[0]]
        chunk_embeddings = [result[0] for result in embedding_results]
        chunk_lens = [result[1] for result in embedding_results]
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)  # type: ignore
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
        return chunk_embeddings.tolist()

    def _embed_with_retry(
        self, text: str | tuple, **kwargs: Any
    ) -> tuple[list[float], int]:
        try:
            retryer = Retrying(
                stop=stop_after_attempt(self.max_retries),
                wait=wait_exponential_jitter(max=10),
                reraise=True,
                retry=retry_if_exception_type(self.retry_error_types),
            )
            for attempt in retryer:
                with attempt:
                    # embedding = (
                    #     self.sync_client.embeddings.create(  # type: ignore
                    #         input=text,
                    #         model=self.model,
                    #         **kwargs,  # type: ignore
                    #     )
                    #     .data[0]
                    #     .embedding
                    #     or []
                    # )

                    if isinstance(text, tuple):
                        text = json.dumps(text)
                    embedding = ollama.embeddings(model="nomic-embed-text", prompt=text)
                    embedding = list(embedding["embedding"])

                    return (embedding, len(text))
        except RetryError as e:
            self._reporter.error(
                message="Error at embed_with_retry()",
                details={self.__class__.__name__: str(e)},
            )
            return ([], 0)
        else:
            # TODO: why not just throw in this case?
            return ([], 0)

    async def _aembed_with_retry(
        self, text: str | tuple, **kwargs: Any
    ) -> tuple[list[float], int]:
        try:
            retryer = AsyncRetrying(
                stop=stop_after_attempt(self.max_retries),
                wait=wait_exponential_jitter(max=10),
                reraise=True,
                retry=retry_if_exception_type(self.retry_error_types),
            )
            async for attempt in retryer:
                with attempt:
                    # embedding = (
                    #     await self.async_client.embeddings.create(  # type: ignore
                    #         input=text,
                    #         model=self.model,
                    #         **kwargs,  # type: ignore
                    #     )
                    # ).data[0].embedding or []

                    if isinstance(text, tuple):
                        text = json.dumps(text)
                    embedding = ollama.embeddings(model="nomic-embed-text", prompt=text)
                    embedding = list(embedding["embedding"])

                    return (embedding, len(text))
        except RetryError as e:
            self._reporter.error(
                message="Error at embed_with_retry()",
                details={self.__class__.__name__: str(e)},
            )
            return ([], 0)
        else:
            # TODO: why not just throw in this case?
            return ([], 0)

microsoft / graphrag