wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

HTTP Error 416: Wrong Request #62

Closed Daniel-Mietchen closed 11 years ago

Daniel-Mietchen commented 11 years ago

Skipping 13 records … Checking MIME types … When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514334/bin/1744-8603-8-11-S1.doc, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3506441/bin/1744-9081-8-50-S1.jpeg, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3506441/bin/1744-9081-8-50-S2.jpeg, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517534/bin/1745-6215-13-191-S1.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3508803/bin/1745-6673-7-19-S1.docx, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3508803/bin/1745-6673-7-19-S2.docx, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3508803/bin/1745-6673-7-19-S3.xlsx, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511297/bin/1746-6148-8-179-S1.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511297/bin/1746-6148-8-179-S2.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3508898/bin/1756-0500-5-608-S1.pdf, the following error occured: “HTTP Error 416: Wrong Request”.

and so on. Looks to me as if something changed on the PMC end.

erlehmann commented 11 years ago

Please also give the command that created the error.

erlehmann commented 11 years ago
echo '3514334' | ./oami_pmc_pmcid_import

gives

Input PMCIDs, delimited by whitespace: Removing “/home/erlehmann/.cache/open-access-media-importer/metadata/raw/pmc_pmcid/efetch.fcgi0” … done.
Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=3514334”, saving into directory “/home/erlehmann/.cache/open-access-media-importer/metadata/raw/pmc_pmcid” …
100% |########################################################################|
Globalization and Health 2012
    What are the barriers to scaling up health interventions in low and middle income countries? A qualitative study of academic leaders in implementation science
/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value.
  param.append(processors[key](compiled_params[key]))
“What are the barriers to scaling up health interventions in low and middle income countries? A qualitative study of academic leaders in implementation science”:
    1 × application/msword
Checking MIME types …
When trying to download <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514334/bin/1744-8603-8-11-S1.doc>, the following error occured: “HTTP Error 416: Wrong Request”.
erlehmann commented 11 years ago

The document in fact exists. Proof:

curl -I http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514334/bin/1744-8603-8-11-S1.doc
HTTP/1.1 200 OK
Date: Tue, 11 Dec 2012 20:03:23 GMT
Server: Apache
Cache-Control: private, max-age=86400
Content-Disposition: attachment; filename=1744-8603-8-11-S1.doc
Etag: 3514334-69915703-20121204223901
NCBI-SID: 015F485A0C7918B1_0284SID
X-Backend-Host: ipmc2
Content-Length: 23552
Content-Type: application/msword
Set-Cookie: ncbi_sid=015F485A0C7918B1_0284SID; domain=.nih.gov; path=/; expires=Wed, 11 Dec 2013 20:03:23 GMT
Vary: Accept-Encoding
Connection: close
erlehmann commented 11 years ago

416 is the HTTP Status Code for “Requested Range Not Satisfiable”. Quote RFC 2616, Section 10.4.17:

   A server SHOULD return a response with this status code if a request
   included a Range request-header field (section 14.35), and none of
   the range-specifier values in this field overlap the current extent
   of the selected resource, and the request did not include an If-Range
   request-header field. (For byte-ranges, this means that the first-
   byte-pos of all of the byte-range-spec values were greater than the
   current length of the selected resource.)
   When this status code is returned for a byte-range request, the
   response SHOULD include a Content-Range entity-header field
   specifying the current length of the selected resource (see section
   14.16). This response MUST NOT use the multipart/byteranges content-
   type.

http://tools.ietf.org/html/rfc2616#section-10.4.17

erlehmann commented 11 years ago

Daniel, can you find out why they disabled range requests?

Daniel-Mietchen commented 11 years ago

The error occurs with both oami_pmc_pmcid_import and oami_pmc_doi_import , so I did not name the command - sorry.

I asked whether they have changed anything on their end.

erlehmann commented 11 years ago

Any update from NCBI?

Daniel-Mietchen commented 11 years ago

Nothing yet.

Daniel-Mietchen commented 11 years ago

Got no further feedback but range requests seem to work fine again.

erlehmann commented 11 years ago

Further testing confirms range requests working again. Closing.