petermr / pygetpapers

a Python version of getpapers
Apache License 2.0
78 stars 9 forks source link

PDF cannot download but error message is saved as *.pdf #19

Closed petermr closed 3 years ago

petermr commented 3 years ago

Describe the bug A clear and concise description of what the bug is.

command to download XML and PDF works for XML but PDF has a server error and outputs

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/backend/ptpmcrender.fcgi">GET&nbsp;/backend/ptpmcrender.fcgi</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
<hr>
<address>Apache/2.2.15 (Red Hat) Server at europepmc.org Port 80</address>
</body></html>

in the PDF file

To Reproduce commandline:

pygetpapers -q '("gwas" OR "genome wide association" OR "risk allele" OR "risk loci") AND ("parkinsons")' -o parkinsons -p -x

This depends on generating an error (see #20) that shows the effect.

Expected behavior Expect a message to the user (e.g. on console or syserr), not in a PDF

Desktop (please complete the following information):

Additional context See also #20 which reports an error in the PDF instead of to the user

ayush4921 commented 3 years ago

It is important to discuss this.. even when you open the URL in browser, it gives error. I think its from the EUPMC's side... For eg:https://europepmc.org/articles/PMC7358442?pdf=render

petermr commented 3 years ago

Hmmm. Can you reproduce it with curl? I'll try your URL

On Sat, Jul 17, 2021 at 8:14 PM Ayush Garg @.***> wrote:

It is important to discuss this.. even when you open the URL in browser, it gives error. I think its from the EUPMC's side... For eg: https://europepmc.org/articles/PMC7358442?pdf=render

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/pygetpapers/issues/19#issuecomment-881946865, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6GHXX5BARTTEUKQBLTYHJA3ANCNFSM5APLTPFA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 3 years ago

It fails for me as well:

Is this their documented API? (We need to make sure we are doing it properly). If so I will post it. ==============Proxy Error

The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /backend/ptpmcrender.fcgi https://europepmc.org/backend/ptpmcrender.fcgi.

Reason: Error reading from remote server


Apache/2.2.15 (Red Hat) Server at europepmc.org Port 80

On Sat, Jul 17, 2021 at 8:41 PM Peter Murray-Rust < @.***> wrote:

Hmmm. Can you reproduce it with curl? I'll try your URL

On Sat, Jul 17, 2021 at 8:14 PM Ayush Garg @.***> wrote:

It is important to discuss this.. even when you open the URL in browser, it gives error. I think its from the EUPMC's side... For eg: https://europepmc.org/articles/PMC7358442?pdf=render

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/pygetpapers/issues/19#issuecomment-881946865, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6GHXX5BARTTEUKQBLTYHJA3ANCNFSM5APLTPFA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK