jina-ai / reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
https://jina.ai/reader
Apache License 2.0
7.02k stars 554 forks source link

Jina Reader responding with HTML content instead of text #147

Closed bitlazy closed 1 month ago

bitlazy commented 1 month ago

Hi team!

I just tried the Jina Reader Demo online and that works really nice! Kudos for the awesome api.

One issue I noticed is when I try to copy the python code from the demo and run it in my local, I only get the whole HTML dom instead of the expected text format. Below are the headers I am passing in my request. Any help is highly appreciated!

        downloadUrl = 'https://r.jina.ai/' + url;
        headers = {
          'X-Target-Selector': target_key,
          'Authorization': 'Bearer {api_key}',
          'X-Timeout': '10',
          'X-With-Images-Summary': 'true',
          'X-With-Links-Summary': 'true'
        }

        response = requests.get(url, headers=headers)   

Screenshot of the output in my local machine

Screenshot 2024-10-17 at 2 50 07 PM
nomagick commented 1 month ago

Hi. The issue in your snippet is quite obvious. The reader URL for extracting from url, should be downloadUrl. However, when later sending the request, it sends to url, the original website, instead of Reader, the downloadUrl.

This may have happened because our code snippet is generated by LLMs, and LLMs sometimes make mistakes. I think you can try regenerating the snippet using the button next to the language selection. There is a high chance it fixes the issue by itself.

bitlazy commented 1 month ago

ahh my bad! I didn't notice my rookie mistake. Thanks for checking on it @nomagick!