Closed phunterlau closed 2 months ago
The same with IEEE free PDFs like https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9275593 Thanks.
Hi @phunterlau
The url from research gate works for me, what's the result on your side?
For https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9275593, you can use POST mode
. Screenshot shown below.
One suggestion, for this webpage https://www.researchgate.net/publication/382994225_Model-Based_Reinforcement_Learning_Approaches_in_the_Low-Data-Regime
If you only want the pdf content, you can enable the Target Selector
option, and set value as #pdf-html-reader
(this value varies among the websites, it's only for this website) to avoid the ads and other meaningless content.
Hi @phunterlau
The url from research gate works for me, what's the result on your side?
For https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9275593, you can use
POST mode
. Screenshot shown below.
Big thanks. The researchgate link shows either empty content or "verify you are a human" content.
The researchgate link shows either empty content or "verify you are a human" content.
If the website has anti-crawler mechnism, then unfortunatelly there is nothing we can do. : (
Thanks. Do we support any parameters to pass refer links to reader? Some URLs check its source of refer like the following. If we remove -e
the site returns error page
curl -L -e "https://scholar.google.com/" \
"https://www.academia.edu/download/51627580/A_Generalized_Reinforcement-Learning_Mod20170203-31871-44ae37.pdf?hl=en&sa=T&oi=ggp&ct=res&cd=13&d=7556059458401941712&ei=IcXCZpzRGPCz6rQP5oSLmQM" \
-H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" \
-o "A_Generalized_Reinforcement-Learning_Model.pdf"
It's supported now.
You can either pass referer
in POST request body
curl --location 'https://r.jina.ai' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://www.academia.edu/download/51627580/A_Generalized_Reinforcement-Learning_Mod20170203-31871-44ae37.pdf?hl=en&sa=T&oi=ggp&ct=res&cd=13&d=7556059458401941712&ei=IcXCZpzRGPCz6rQP5oSLmQM",
"referer": "https://scholar.google.com/",
"noCache": true
}'
or pass it in the GET request header
curl --location 'https://r.jina.ai/https://www.academia.edu/download/51627580/A_Generalized_Reinforcement-Learning_Mod20170203-31871-44ae37.pdf?hl=en&sa=T&oi=ggp&ct=res&cd=13&d=7556059458401941712&ei=IcXCZpzRGPCz6rQP5oSLmQM' \
--header 'X-Referer: https://scholar.google.com' \
--header 'X-No-Cache: true'
It would be very helpful if research gate's PDF can be accessed by Reader API. Thank you. Example links
Research gate: https://www.researchgate.net/profile/Luca-Mertens-2/publication/382994225_Model-Based_Reinforcement_Learning_Approaches_in_the_Low-Data-Regime/links/66b6284e51aa0775f2779ac0/Model-Based-Reinforcement-Learning-Approaches-in-the-Low-Data-Regime.pdf