Open yanyu2015 opened 3 weeks ago
Hello @yanyu2015,
I have used this pipeline to ingest all my personal files (approx. 20.000).
So I think even many more can be ingested. The limit would be the vector db.
Awesome! I encountered some issues during the configuration and was about to give up, but seeing your reply, I decided to give it another try. Thank you so much for your response!
The cool thing about this approach is, that you can recursively ingest whole folders. Most other solutions just fucus on a small number of files.
I understand how to get the API key, but does it support custom models on HuggingFace? I found that the construction of this PDF is slow, so will my data be saved after the build is complete? For example, I want to parse different hundreds of PDFs for different topics, and there seem to be many details I need to consult with you.
Additionally, I am using the command:
python uvicorn_start.py
And I encountered the following issue:
E:\literatureAI\local-gen-search>python uvicorn_start.py
INFO: Will watch for changes in these directories: ['E:\\literatureAI\\local-gen-search']
WARNING: "workers" flag is ignored when reloading is enabled.
ERROR: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions.
You have some issue with access permissions on your sockets. Maybe you need to modify something in your firewall
It seems to be a problem with the port; I changed the port to 8017, and the output is as follows:
INFO: Will watch for changes in these directories: ['E:\\literatureAI\\local-gen-search']
WARNING: "workers" flag is ignored when reloading is enabled.
INFO: Uvicorn running on http://127.0.0.1:8017 (Press CTRL+C to quit)
INFO: Started reloader process [31696] using WatchFiles
E:\Pyvenv\pyvi39\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
E:\literatureAI\local-gen-search\api.py:29: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.
hf = HuggingFaceEmbeddings(
E:\Pyvenv\pyvi39\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
E:\Pyvenv\pyvi39\lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
INFO: Started server process [17768]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 127.0.0.1:7475 - "GET /ask_localai HTTP/1.1" 405 Method Not Allowed
INFO: 127.0.0.1:7476 - "GET / HTTP/1.1" 200 OK
INFO: 127.0.0.1:7502 - "GET /ask_localai HTTP/1.1" 405 Method Not Allowed
INFO: 127.0.0.1:7550 - "GET / HTTP/1.1" 200 OK
INFO: 127.0.0.1:7550 - "GET /search HTTP/1.1" 405 Method Not Allowed
When I enter http://127.0.0.1:8017/
in the address bar, it returns {"message":"Hello World"}
, but of course, entering http://127.0.0.1:8017/ask_localai
it returns {“detail”:“Method Not Allowed”}
Is it that this address cannot be accessed directly and should be used within the py script? Then I started it using: streamlit run user_interface.py
this method But when I asked a question, I got the following error:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Traceback:
File "E:\Pyvenv\pyvi39\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling
result = func()
File "E:\Pyvenv\pyvi39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 579, in code_to_exec
exec(code, module.__dict__)
File "E:\literatureAI\local-gen-search\user_interface.py", line 21, in <module>
answer = json.loads(response.text)["answer"]
File "C:\Users\admin\anaconda3\envs\py39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\admin\anaconda3\envs\py39\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\admin\anaconda3\envs\py39\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
/
I want to upload as many files as possible.