Closed waetr closed 2 months ago
Hi, thank for you enthusiasm For the first issue: sorry I don't find the differences between the implementation and your fixed version https://github.com/gusye1234/nano-graphrag/blob/681667963c0f6fbedef53e5dbb6786ac64b2634d/nano_graphrag/_utils.py#L71-L73
Maybe you copy the before implementation not your fix?
For the second issue: Yeah, I think that's a problem. We should enclose multi-line cell.
Hi, thank for you enthusiasm For the first issue: sorry I don't find the differences between the implementation and your fixed version
Maybe you copy the before implementation not your fix?
For the second issue: Yeah, I think that's a problem. We should enclose multi-line cell.
For issue 1: sry for making a mistake here. See the latest update: the fixed version is
def write_json(json_obj, file_name):
with open(file_name, "w", encoding="utf-8-sig") as f:
json.dump(json_obj, f, indent=2, ensure_ascii=False)
Dear authors,
Thanks very much for your generous and valuable contributions to this project! When I was trying to hack the code, I discovered a few minor issues. I listed all of them below:
1. Error when indexing files with encoding formats other than 'gbk'
When I try to run the code as per the instruction of readme.md with mock_data.txt as input (which is encoded as "utf-8-sig"), an error occurs in the indexing stage:
While following the instructions in
readme.md
and usingmock_data.txt
(which is encoded as "utf-8-sig") as input, I encountered an error during the indexing stage. The error output has been shown below:In my case, I fixed it by simply modifying the write_json function to make its written format align with the input file format:
I don't expect this to fully resolve the whole issue as it doesn't apply to all encoding formats. Perhaps we could find a more flexible solution that can handle various encoding formats for the input text file.
2. issues with csv-formatted prompts
Specifically, in the function
async def _build_local_query_context
in_op.py
, I noticed that the stringscommunities_context
andtext_units_context
are designed to follow the.csv
format. However, while each cell represents the content of a multiline text chunk or a community summary, the cells are not enclosed in double quotes. For example, the string looks like this:While I think the following looks more formal as the .csv format, if it contains multiple lines:
In contrast, other strings like
entities_context
andrelations_context
are correctly enclosed in quotes to preserve cell boundaries. Although this is a minor issue, I am concerned that it might lead to potential problems, such as misrecognition by LLMs.That's the full description. Wish this project continued growth!