microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
18.85k stars 1.84k forks source link

[Issue]: <title> not supported for japanese language.. #816

Closed prasantpoudel closed 3 months ago

prasantpoudel commented 3 months ago

Is there an existing issue for this?

Describe the issue

I am testing the graphrag for japanese text to create a graph and query the text file using local and global method.

issue1- when the community report are generated , many part of the report the result are \"title\": \"\u5343\u6210\u5de5\u696d\u682a\u5f0f\u4f1a\u793e like unicode format

issue2: when i query using the local method, it don't give the output from the text that i feeded at the time of indexing process. the llm is giving it own knowledge instead of searching from the lancedb or from community report.

model use llama3 using ollama service

natoverse commented 3 months ago

Please see #696 for non-English content, and #657 for non-OpenAI models. There may be commentary from the community that helps resolve your issue.

ODENSUKI commented 3 months ago

This is caused by not doing ensure_ascii=False in json.dumps. I solved this by changing the class FileWorkflowCallbacks in file_workflow_callbacks.py as follows

class FileWorkflowCallbacks(NoopWorkflowCallbacks):
    """A reporter that writes to a file."""

    _out_stream: TextIOWrapper

    def __init__(self, directory: str):
        """Create a new file-based workflow reporter."""
        Path(directory).mkdir(parents=True, exist_ok=True)
        self._out_stream = open(  # noqa: PTH123, SIM115
            Path(directory) / "logs.json", "a", encoding="utf-8", errors="strict"
        )

    def on_error(
        self,
        message: str,
        cause: BaseException | None = None,
        stack: str | None = None,
        details: dict | None = None,
    ):
        """Handle when an error occurs."""
        self._out_stream.write(
            json.dumps({
                "type": "error",
                "data": message,
                "stack": stack,
                "source": str(cause),
                "details": details,
            }, ensure_ascii=False)
            + "\n"
        )
        message = f"{message} details={details}"
        log.info(message)

    def on_warning(self, message: str, details: dict | None = None):
        """Handle when a warning occurs."""
        self._out_stream.write(
            json.dumps({"type": "warning", "data": message, "details": details}, ensure_ascii=False) + "\n"
        )
        _print_warning(message)

    def on_log(self, message: str, details: dict | None = None):
        """Handle when a log message is produced."""
        self._out_stream.write(
            json.dumps({"type": "log", "data": message, "details": details}, ensure_ascii=False) + "\n"
        )

        message = f"{message} details={details}"
        log.info(message)

def _print_warning(skk):
    log.warning(skk)