huggingface / chat-ui

Open source codebase powering the HuggingChat app
https://huggingface.co/chat
Apache License 2.0
7.18k stars 1.04k forks source link

Integrate jina.ai Reader for search and website content extraction #1348

Open gururise opened 1 month ago

gururise commented 1 month ago

The jina.ai READER API has support for web search, and also returns the content of a webpage in an LLM Friendly format: https://jina.ai/reader

Using this single tool, we could not have to use playwright for extracting data from websites, or serp.ai for search.

nsarrazin commented 1 month ago

I think it would make sense to provide an abstraction layer around web extraction specifically. Playwright has been working great but requires extra steps to install which caused friction for some users.

We could support

This would mirror the way we already support multiple search results providers.

Will try to come back to this later unless someone feels comfortable tackling it, just let me know in that case :rocket:

krakenftw commented 1 month ago

can i work on this?