netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
GNU Affero General Public License v3.0
11.77k stars 1.13k forks source link

解析文档是 sanic_api.log日志报多 #108

Open deauss2017 opened 9 months ago

deauss2017 commented 9 months ago

报错内容

[ERROR] Exception occurred while handling uri: 'http://10.230.107.105:8777/api/local_doc_qa/local_doc_chat' Traceback (most recent call last): File "handle_request", line 132, in handle_request "_asgi_lifespan", File "/usr/local/lib/python3.10/dist-packages/sanic/response/types.py", line 547, in stream await self.streaming_fn(self) File "/workspace/qanything_local/qanything_kernel/qanything_server/handler.py", line 355, in generate_answer for resp, next_history in local_doc_qa.get_knowledge_based_answer( File "/workspace/qanything_local/qanything_kernel/core/local_doc_qa.py", line 219, in get_knowledge_based_answer source_documents = self.get_source_documents(retrieval_queries, milvus_kb) File "/workspace/qanything_local/qanything_kernel/core/local_doc_qa.py", line 131, in get_source_documents embs = self.embeddings._get_len_safe_embeddings(queries) File "/workspace/qanything_local/qanything_kernel/connector/embedding/embedding_for_local.py", line 34, in _get_len_safe_embeddings embeddings = future.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/workspace/qanything_local/qanything_kernel/connector/embedding/embedding_for_local.py", line 21, in _get_embedding embeddings = embedding_client.get_embedding(queries, max_length=LOCAL_EMBED_MAX_LENGTH) File "/workspace/qanything_local/qanything_kernel/connector/embedding/embedding_client.py", line 40, in get_embedding inputs_data = self._tokenizer(sentences, padding=True, truncation=True, max_length=max_length, return_tensors='np') File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2802, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2860, in _call_one raise ValueError( ValueError: text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples). ERROR:sanic.error:Exception occurred while handling uri: 'http://10.230.107.105:8777/api/local_doc_qa/local_doc_chat' Traceback (most recent call last): File "handle_request", line 132, in handle_request "_asgi_lifespan", File "/usr/local/lib/python3.10/dist-packages/sanic/response/types.py", line 547, in stream await self.streaming_fn(self) File "/workspace/qanything_local/qanything_kernel/qanything_server/handler.py", line 355, in generate_answer for resp, next_history in local_doc_qa.get_knowledge_based_answer( File "/workspace/qanything_local/qanything_kernel/core/local_doc_qa.py", line 219, in get_knowledge_based_answer source_documents = self.get_source_documents(retrieval_queries, milvus_kb) File "/workspace/qanything_local/qanything_kernel/core/local_doc_qa.py", line 131, in get_source_documents embs = self.embeddings._get_len_safe_embeddings(queries) File "/workspace/qanything_local/qanything_kernel/connector/embedding/embedding_for_local.py", line 34, in _get_len_safe_embeddings embeddings = future.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/workspace/qanything_local/qanything_kernel/connector/embedding/embedding_for_local.py", line 21, in _get_embedding embeddings = embedding_client.get_embedding(queries, max_length=LOCAL_EMBED_MAX_LENGTH) File "/workspace/qanything_local/qanything_kernel/connector/embedding/embedding_client.py", line 40, in get_embedding inputs_data = self._tokenizer(sentences, padding=True, truncation=True, max_length=max_length, return_tensors='np') File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2802, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2860, in _call_one raise ValueError( ValueError: text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples). 38%|███▊ | 5/13 [04:25<06:52, 51.52s/it]INFO:debug_logger:list_docs zzp

麻烦看看这个报错是怎么回事,前端现象就是 上传一篇10M的pdf文件,需要4-5分钟才能解析完

GreatStep commented 2 months ago

embedding的线程数 和 批次 调大些 (显存要够啊)