gusye1234 / nano-vectordb

A simple, easy-to-hack Vector Database
65 stars 3 forks source link

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 190: illegal multibyte sequence #3

Closed cp-1919 closed 1 month ago

cp-1919 commented 1 month ago

您好! 当我在使用LightRag时,发现以下报错:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 190: illegal multibyte sequence

这会在使用中文时发生 这可能是open函数在自动识别文件编码类型时将中文错误识别为gbk编码导致的 将dbs文件下的load_storage函数修改如下

def load_storage(file_name) -> Union[DataBase, None]:
    if not os.path.exists(file_name):
        return None
    with open(file_name, encoding='utf-8') as f:
        data = json.load(f)
    data["matrix"] = buffer_string_to_array(data["matrix"]).reshape(
        -1, data["embedding_dim"]
    )
    logger.info(f"Load {data['matrix'].shape} data")
    return data

之后似乎就可以了

第一次写issue,语言不周请见谅