jina-ai / reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
https://jina.ai/reader
Apache License 2.0
7.05k stars 555 forks source link

Got "Robot Challenge Screen" #153

Closed matthew-aic closed 2 weeks ago

matthew-aic commented 2 weeks ago

Tried the following URL:

https://www.thethaitalay.com/TheThaitalayMenu.pdf

The result was:

Title: Robot Challenge Screen

URL Source: https://www.thethaitalay.com/TheThaitalayMenu.pdf

Markdown Content:
thethaitalay.com
----------------

Checking the site connection security

![Image 1: CDN icon](https://d1rozh26tys225.cloudfront.net/loader.svg)

However, if I try to access the same URL from Chrome, the document (a menu of a Thai restaurant) appears as expected.

Thanks for your help with this! Matthew Clegg matthew@aideacatalyst.com

nomagick commented 2 weeks ago

https://r.jina.ai/https://www.thethaitalay.com/TheThaitalayMenu.pdf

Seems OK now.

To mitigate similar issues:

Include x-no-cache: true header and try multiple times or retry after a while.

If it never succeeds, the website is probably serious about blocking bot access, and Reader cannot navigate around it.