Open oscar-o-oneill opened 3 months ago
okay this is weird, i get the same empty result; however if i use pageshot mode it does return the full webpage
could u look at it? @nomagick
Thanks, @hanxiao. Just wanted to bring this to your attention! I will keep following the thread and help out if I can.
Hi @oscar-o-oneill did you have same issues on other pages?
I found that it seems there is some trick in this specific webpage that makes the browser treat the webpage isn't fully loaded until encountering the Timeout, which is 30s in this case by default. But I'm still trying to identify what's the trick in the page makes this situation.
It would be helpful if you have more bad cases, so that I can find the common pattern
Hi @mapleeit, no, I have not found this issue on many other pages. Reader usually works really well!
I will definitely report any issues I may find with other web pages in the future.
Thank you for making Jina AI Reader.
It looks like some kind of bot-prevention mechanism from the "edgesuite". It seems to be replacing the DOM contents in a fraction and making Reader capture its warning messages.
Hi, I love reader! It's so useful. I am playing around with it, and I noticed it isn't able to extract any content from this URL.
https://www.canada.ca/en/women-gender-equality/gender-based-violence/gender-based-violence-glossary.html
On navigating to the reader page for it, I just get this response:
What's going on? It's a fairly simple page.