How many files can be retrieved at a time?

yanyu2015 commented 3 weeks ago

I want to upload as many files as possible.

pdchristian commented 3 weeks ago

Hello @yanyu2015,

I have used this pipeline to ingest all my personal files (approx. 20.000).

So I think even many more can be ingested. The limit would be the vector db.

yanyu2015 commented 3 weeks ago

Awesome! I encountered some issues during the configuration and was about to give up, but seeing your reply, I decided to give it another try. Thank you so much for your response!

pdchristian commented 3 weeks ago

The cool thing about this approach is, that you can recursively ingest whole folders. Most other solutions just fucus on a small number of files.

yanyu2015 commented 3 weeks ago

I understand how to get the API key, but does it support custom models on HuggingFace? I found that the construction of this PDF is slow, so will my data be saved after the build is complete? For example, I want to parse different hundreds of PDFs for different topics, and there seem to be many details I need to consult with you.

Additionally, I am using the command:

python uvicorn_start.py

And I encountered the following issue:

E:\literatureAI\local-gen-search>python uvicorn_start.py
INFO:     Will watch for changes in these directories: ['E:\\literatureAI\\local-gen-search']
WARNING:  "workers" flag is ignored when reloading is enabled.
ERROR:    [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions.

nikolamilosevic86 commented 3 weeks ago

You have some issue with access permissions on your sockets. Maybe you need to modify something in your firewall

yanyu2015 commented 3 weeks ago

It seems to be a problem with the port; I changed the port to 8017, and the output is as follows:

INFO:     Will watch for changes in these directories: ['E:\\literatureAI\\local-gen-search']
WARNING:  "workers" flag is ignored when reloading is enabled.
INFO:     Uvicorn running on http://127.0.0.1:8017 (Press CTRL+C to quit)
INFO:     Started reloader process [31696] using WatchFiles
E:\Pyvenv\pyvi39\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\literatureAI\local-gen-search\api.py:29: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.
  hf = HuggingFaceEmbeddings(
E:\Pyvenv\pyvi39\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\Pyvenv\pyvi39\lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO:     Started server process [17768]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     127.0.0.1:7475 - "GET /ask_localai HTTP/1.1" 405 Method Not Allowed
INFO:     127.0.0.1:7476 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:7502 - "GET /ask_localai HTTP/1.1" 405 Method Not Allowed
INFO:     127.0.0.1:7550 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:7550 - "GET /search HTTP/1.1" 405 Method Not Allowed

When I enter http://127.0.0.1:8017/ in the address bar, it returns {"message":"Hello World"}, but of course, entering http://127.0.0.1:8017/ask_localai it returns {“detail”:“Method Not Allowed”} Is it that this address cannot be accessed directly and should be used within the py script? Then I started it using: streamlit run user_interface.py this method But when I asked a question, I got the following error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Traceback:
File "E:\Pyvenv\pyvi39\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
File "E:\Pyvenv\pyvi39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 579, in code_to_exec
    exec(code, module.__dict__)
File "E:\literatureAI\local-gen-search\user_interface.py", line 21, in <module>
    answer = json.loads(response.text)["answer"]
File "C:\Users\admin\anaconda3\envs\py39\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
File "C:\Users\admin\anaconda3\envs\py39\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\admin\anaconda3\envs\py39\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

/

nikolamilosevic86 / local-genAI-search

How many files can be retrieved at a time? #9