open-webui / open-webui

User-friendly WebUI for LLMs (Formerly Ollama WebUI)
https://openwebui.com
MIT License
29.41k stars 3.18k forks source link

enh: client side web crawling for RAG #2654

Open arjunkrishna opened 1 month ago

arjunkrishna commented 1 month ago

"#" usage for websites and youtube videos should download the content on the client side and then use rag on the server side.

If the server on which openwebui is running is blocked from accessing the internet, then this functionality does not work. It would really help if downloading the content is done on the client side and then that content is then added to the rag on server side code.

Discussed in https://github.com/open-webui/open-webui/discussions/1959

Originally posted by **arjunkrishna** May 3, 2024 The first step in YouTube transcribing where we add the url after #, does it fetch the transcript from YouTube on client side or the processing of download of the script happens on the server side?
tjbck commented 1 month ago

Great idea! PR welcome!

arjunkrishna commented 1 month ago

Unfortunately not a python or svelte developer :) anyone willing to take up this change?

cheahjs commented 1 month ago

This is not very feasible outside of very narrow usecases, as the client is a browser, you would need the website to have the correct CORS headers that allow fetching, which is only normally the case when exposing APIs.

tjbck commented 1 month ago

Correct me if I'm wrong but I believe this might be doable using a chrome extension.

arjunkrishna commented 4 weeks ago

how does the existing rag for documents work? can it be done using javascript on client side where it fetches the page's content or youtube's transcript and saves it into browser's localstorage if needed and then upload it as <webpage/youtubelink url>.txt to the rag used for documents.

que-nguyen commented 3 weeks ago

Correct me if I'm wrong but I believe this might be doable using a chrome extension.

Yeah, using a Chrome extension could work, but it might not be the best for mobile browsers.