wxywb / history_rag

789 stars 102 forks source link

milvus standalone时不时崩溃 #67

Open flash201524 opened 3 months ago

flash201524 commented 3 months ago

也许是txt太多了?

flash201524 commented 3 months ago

RPC error: [insert_rows], <MilvusException: (code=1100, message=the length (74246) of dynamic field exceeds max length (65536): invalid parameter[expected=valid length dynamic field][actual=length exceeds max length])>, <Time:{'RPC start': '2024-04-11 23:49:01.457415', 'RPC error': '2024-04-11 23:49:02.505110'}> Traceback (most recent call last): File "E:\OneDrive\history_rag-master\cli.py", line 120, in cli.run() File "E:\OneDrive\history_rag-master\cli.py", line 53, in run self.parse_input(command_text) File "E:\OneDrive\history_rag-master\cli.py", line 65, in parse_input self.build_index(path=commands[1], overwrite=False) File "E:\OneDrive\history_rag-master\cli.py", line 92, in build_index self._executor.build_index(path, overwrite) File "E:\OneDrive\history_rag-master\executor.py", line 186, in build_index self.index = VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\llama_index\indices\vector_store\base.py", line 53, in init super().init( File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\llama_index\indices\base.py", line 75, in init index_struct = self.build_index_from_nodes( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\llama_index\indices\vector_store\base.py", line 274, in build_index_from_nodes return self._build_index_from_nodes(nodes, insert_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\llama_index\indices\vector_store\base.py", line 246, in _build_index_from_nodes self._add_nodes_to_index( File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\llama_index\indices\vector_store\base.py", line 200, in _add_nodes_to_index new_ids = self._vector_store.add(nodes_batch, insert_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\llama_index\vector_stores\milvus.py", line 199, in add self.collection.insert(insert_list) File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\orm\collection.py", line 508, in insert return conn.insert_rows( ^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\decorators.py", line 147, in handler raise e from e File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\decorators.py", line 143, in handler return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\decorators.py", line 182, in handler return func(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\decorators.py", line 122, in handler raise e from e File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\decorators.py", line 87, in handler return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\client\grpc_handler.py", line 514, in insert_rows check_status(response.status) File "E:\software\Anaconda\envs\pytorch\Lib\site-packages\pymilvus\client\utils.py", line 60, in check_status raise MilvusException(status.code, status.reason, status.error_code) pymilvus.exceptions.MilvusException: <MilvusException: (code=1100, message=the length (74246) of dynamic field exceeds max length (65536): invalid parameter[expected=valid length dynamic field][actual=length exceeds max length])>

wxywb commented 3 months ago

这不是崩溃的,因为milvus存储text的字符串长度是有限制的,如图是65536,而由于你的切分方式导致,切出了一个74246的长文本,事实上,长文本的embedding效果一般也不好,因为信息过多让特征不再明显。因为history_rag的文本切分是针对史料,所以不一定适合你的文本,使用更通用的文本切分方式,请参考https://github.com/wxywb/history_rag/issues/63

flash201524 commented 3 months ago

这不是崩溃的,因为milvus存储text的字符串长度是有限制的,如图是65536,而由于你的切分方式导致,切出了一个74246的长文本,事实上,长文本的embedding效果一般也不好,因为信息过多让特征不再明显。因为history_rag的文本切分是针对史料,所以不一定适合你的文本,使用更通用的文本切分方式,请参考https://github.com/wxywb/history_rag/issues/63

明白了,但是我没办法把已经切好的文本删除了,输入remove 文件夹名之后说有多少条但是删除0条,有没有什么其他删除的指令呢

wxywb commented 3 months ago
from pymilvus import Collection, connections
connections.connect("default", host="localhost", port="19530")
#在cfgs/config.yaml中的默认值
col_name = "history_rag" 
col = Collection(col_name)
col.load()
col.drop()