Open cyberflying opened 1 week ago
and the output file "vdb_entities.json" encoding is also not utf-8
是不是在windows下用的啊?我也遇到过,可能没有指定utf8写
是不是在windows下用的啊?我也遇到过,可能没有指定utf8写
对,windows下。改代码指定了,发现又有其他地方报同样的错,还是请作者改下源码吧。
已经修复 可以pull下最新的代码测试下 Fixed save/write encoding problem of utf-8
已经修复 可以pull下最新的代码测试下 Fixed save/write encoding problem of utf-8
感谢回复!不过还是报错了, line 121 in _storage.py: Exception has occurred: UnicodeEncodeError 'gbk' codec can't encode character '\uc0bc' in position 3: illegal multibyte sequence File "C:\demo\nano-graphrag\nano_graphrag\graphrag.py", line 312, in ainsert await self.chunk_entity_relation_graph.clustering( File "C:\demo\nano-graphrag\nano_graphrag_storage.py", line 374, in clustering await self._clustering_algorithms[algorithm]() File "C:\demo\nano-graphrag\nano_graphrag_storage.py", line 437, in _leiden_clustering from graspologic.partition import hierarchical_leiden ModuleNotFoundError: No module named 'past'
During handling of the above exception, another exception occurred:
File "C:\demo\nano-graphrag\nano_graphrag_storage.py", line 121, in index_done_callback
self._client.save()
File "C:\demo\nano-graphrag\nano_graphrag\graphrag.py", line 339, in _insert_done
await asyncio.gather(*tasks)
File "C:\demo\nano-graphrag\nano_graphrag\graphrag.py", line 323, in ainsert
await self._insert_done()
File "C:\demo\nano-graphrag\nano_graphrag\graphrag.py", line 205, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\demo\nano-graphrag\test.py", line 12, in
另外: write的文件 vdb_entities.json, UTF-8打开还是乱码,gb2312打开正常。
是新的working dir吗?
原来的working_dir,只保留了原文件.txt,删除了其他所有中间产生的文件。我也换4o为4o-mini了,不删除中间文件会报错。
你更新仓库的方式是 pip install git+ 吗? btw 你需要pip install future
哦,更新了仓库,却忘了更新pip install 的nano :( 回头再测试下。多谢提醒!
获取 Outlook for iOShttps://aka.ms/o0ukef
发件人: Rangehow @.> 发送时间: Friday, September 20, 2024 2:45:15 PM 收件人: gusye1234/nano-graphrag @.> 抄送: Author @.>; Comment @.> 主题: Re: [gusye1234/nano-graphrag] encounter many 'gbk' codec errors (Issue #50)
你更新仓库的方式是 pip install git+ 吗? btw 你需要pip install future
― Reply to this email directly, view it on GitHubhttps://github.com/gusye1234/nano-graphrag/issues/50#issuecomment-2362950488 or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABPZDAWGQ3HYQPO2LDDNMB3ZXO77ZBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE4DGMZVGEZDGNRXQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGUZTGMBRGE2TCOFHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you authored the thread.
Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Since offical GraphRAG require UTF-8 encoding, prepare some input files which is UTF-8 format. When using in this nano-graphrag, I hardcode with encoding='utf-8' , but encount many 'gbk' codec errors, could I have a global config to determine the encoding format?
for example, in _storage.py:
Thanks!