nikolamilosevic86 / local-genAI-search

Local-GenAI-Search is a generative search engine based on Llama 3, langchain and qdrant that answers questions based on your local files
GNU General Public License v3.0
83 stars 30 forks source link

How many files can be retrieved at a time? #9

Open yanyu2015 opened 3 weeks ago

yanyu2015 commented 3 weeks ago

I want to upload as many files as possible.

pdchristian commented 3 weeks ago

Hello @yanyu2015,

I have used this pipeline to ingest all my personal files (approx. 20.000).

So I think even many more can be ingested. The limit would be the vector db.

yanyu2015 commented 3 weeks ago

Awesome! I encountered some issues during the configuration and was about to give up, but seeing your reply, I decided to give it another try. Thank you so much for your response!

pdchristian commented 3 weeks ago

The cool thing about this approach is, that you can recursively ingest whole folders. Most other solutions just fucus on a small number of files.

yanyu2015 commented 3 weeks ago

I understand how to get the API key, but does it support custom models on HuggingFace? I found that the construction of this PDF is slow, so will my data be saved after the build is complete? For example, I want to parse different hundreds of PDFs for different topics, and there seem to be many details I need to consult with you.

Additionally, I am using the command:

python uvicorn_start.py

And I encountered the following issue:

E:\literatureAI\local-gen-search>python uvicorn_start.py
INFO:     Will watch for changes in these directories: ['E:\\literatureAI\\local-gen-search']
WARNING:  "workers" flag is ignored when reloading is enabled.
ERROR:    [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions.
nikolamilosevic86 commented 3 weeks ago

You have some issue with access permissions on your sockets. Maybe you need to modify something in your firewall

yanyu2015 commented 3 weeks ago

It seems to be a problem with the port; I changed the port to 8017, and the output is as follows:

INFO:     Will watch for changes in these directories: ['E:\\literatureAI\\local-gen-search']
WARNING:  "workers" flag is ignored when reloading is enabled.
INFO:     Uvicorn running on http://127.0.0.1:8017 (Press CTRL+C to quit)
INFO:     Started reloader process [31696] using WatchFiles
E:\Pyvenv\pyvi39\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\literatureAI\local-gen-search\api.py:29: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.
  hf = HuggingFaceEmbeddings(
E:\Pyvenv\pyvi39\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\Pyvenv\pyvi39\lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO:     Started server process [17768]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     127.0.0.1:7475 - "GET /ask_localai HTTP/1.1" 405 Method Not Allowed
INFO:     127.0.0.1:7476 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:7502 - "GET /ask_localai HTTP/1.1" 405 Method Not Allowed
INFO:     127.0.0.1:7550 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:7550 - "GET /search HTTP/1.1" 405 Method Not Allowed

When I enter http://127.0.0.1:8017/ in the address bar, it returns {"message":"Hello World"}, but of course, entering http://127.0.0.1:8017/ask_localai it returns {“detail”:“Method Not Allowed”} Is it that this address cannot be accessed directly and should be used within the py script? Then I started it using: streamlit run user_interface.py this method But when I asked a question, I got the following error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Traceback:
File "E:\Pyvenv\pyvi39\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
File "E:\Pyvenv\pyvi39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 579, in code_to_exec
    exec(code, module.__dict__)
File "E:\literatureAI\local-gen-search\user_interface.py", line 21, in <module>
    answer = json.loads(response.text)["answer"]
File "C:\Users\admin\anaconda3\envs\py39\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
File "C:\Users\admin\anaconda3\envs\py39\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\admin\anaconda3\envs\py39\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

/