irthomasthomas / undecidability

6 stars 2 forks source link

Reader API #865

Open ShellLM opened 1 month ago

ShellLM commented 1 month ago

Reader API

Get LLM-friendly input from a URL or a web search, by simply adding r.jina.ai in front.

Add https://r.jina.ai/ to any URL in your code or tool where LLM access is expected. This will return the main content of the page in clean, LLM-friendly text.

Search a query Add https://s.jina.ai/ to your query. This will call the search engine and returns top-5 results with their URLs and contents, each in clean, LLM-friendly text.

Advanced Usage The behavior of the Reader API can be controlled with request headers. Here is a complete list of supported headers."

Reader API

Basic Usage

Read a URL

Add https://r.jina.ai/ to any URL in your code or tool where LLM access is expected. This will return the main content of the page in clean, LLM-friendly text.

Enter your URL https://r.jina.ai/

Search a query

Add https://s.jina.ai/ to your query. This will call the search engine and returns top-5 results with their URLs and contents, each in clean, LLM-friendly text.

Enter your query https://s.jina.ai/

Advanced Usage

The behavior of the Reader API can be controlled with request headers. Here is a complete list of supported headers.

Read or Search Mode

Read mode is for accessing the content of a URL, while Search mode allows you to search a query on the web, applying Read mode to each search result URL.

Content Format

You can control the level of detail in the response to prevent over-filtering. The default pipeline is optimized for most websites and LLM input.

Add API Key for Higher Rate Limit

Enter your Jina API key to access a higher rate limit. For latest rate limit information, please refer to the table below.

Custom Timeout

Can be useful when the page is too slow to render. For the search endpoint, it's the maximum time to wait for reading all search results.

Target Selector

Provide a CSS selector to focus on a more specific part of the page. Useful when your desired content doesn't show under the default settings.

Wait For Selector

Wait for a specific element to appear before returning. Useful when your desired content doesn't show under the default settings.

Gather All Links At the End

A "Buttons & Links" section will be created at the end. This helps the downstream LLMs or web agents navigating the page or take further actions.

Gather All Images At the End

An "Images" section will be created at the end. This gives the downstream LLMs an overview of all visuals on the page, which may improve reasoning.

Use POST Method

Use POST instead of GET method with a URL passed in the body. Useful for building SPAs with hash-based routing.

JSON Response

The response will be in JSON format, containing the URL, title, content, and timestamp (if available). In Search mode, it returns a list of five entries, each following the described JSON structure.

Forward Cookie

Our API server can forward your custom cookie settings when accessing the URL, which is useful for pages requiring extra authentication. Note that requests with cookies will not be cached.

Image Caption

Captions all images at the specified URL, adding 'Image [idx]: [caption]' as an alt tag for those without one. This allows downstream LLMs to interact with the images in activities such as reasoning and summarizing.

Use a Proxy Server

Our API server can utilize your proxy to access URLs, which is helpful for pages accessible only through specific proxies.

Bypass the Cache

Our API server caches both Read and Search mode contents for a certain amount of time. To bypass this cache, set this header to true.

Stream Mode

Stream mode is beneficial for large target pages, allowing more time for the page to fully render. If standard mode results in incomplete content, consider using Stream mode.

Request Examples

Bash

curl 'https://r.jina.ai/https://example.com' \
    -H "Authorization: Bearer jina_138dd69b77644a9c9c4043efdde5f37attcHYkCNdn-b17rZ-uCOzU32ZS8C"

JavaScript

fetch('https://r.jina.ai/https://example.com', {
  method: 'GET',
  headers: {
    "Authorization": "Bearer jina_138dd69b77644a9c9c4043efdde5f37attcHYkCNdn-b17rZ-uCOzU32ZS8C"
  },
})

Response

Title: Example Domain

URL Source: https://example.com/

Markdown Content: This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

More information...

Suggested labels

None

ShellLM commented 1 month ago

Related content

774 similarity score: 0.88

762 similarity score: 0.88

386 similarity score: 0.87

678 similarity score: 0.87

778 similarity score: 0.87

396 similarity score: 0.87