huggingface / chat-ui

Open source codebase powering the HuggingChat app
https://huggingface.co/chat
Apache License 2.0
7.61k stars 1.12k forks source link

Integrate jina.ai Reader for search and website content extraction #1348

Open gururise opened 4 months ago

gururise commented 4 months ago

The jina.ai READER API has support for web search, and also returns the content of a webpage in an LLM Friendly format: https://jina.ai/reader

Using this single tool, we could not have to use playwright for extracting data from websites, or serp.ai for search.

nsarrazin commented 4 months ago

I think it would make sense to provide an abstraction layer around web extraction specifically. Playwright has been working great but requires extra steps to install which caused friction for some users.

We could support

This would mirror the way we already support multiple search results providers.

Will try to come back to this later unless someone feels comfortable tackling it, just let me know in that case :rocket:

krakenftw commented 3 months ago

can i work on this?