ropensci-archive / crminer

:no_entry: ARCHIVED :no_entry: Fetch 'Scholary' Full Text from 'Crossref'
Other
17 stars 5 forks source link

elsevier first page thing #43

Closed sckott closed 4 years ago

sckott commented 4 years ago

when you don't have access to a paper, at least for some papers, elsevier gives a 200 response and returns the first page only, and gives header:

< X-ELS-Status: WARNING - Response limited to first page because requestor not entitled to resource

any thoughts @mark-fangzhou-xie ?

sckott commented 4 years ago

e.g., when i'm not on my VPN

x = crm_links("10.1053/j.jvca.2013.06.022")
crm_text(x, "pdf", verbose = TRUE)
#> < HTTP/1.1 200 OK
#> < allow: GET
#> < Content-Encoding: gzip
#> < Content-Type: application/pdf
#> < X-ELS-ResourceVersion: default
#> < X-ELS-Status: WARNING - Response limited to first page because requestor not entitled to resource
fangzhou-xie commented 4 years ago

I can reproduce this problem on my non-VPN machine as well.

  1. Sure. I think warning users on this should be made available.

  2. My opinion is leaning towards the latter, i.e. failing the requests and remove the file with 1st page. I believe most people, if not all, who want to use crminer to get the full text of articles have access (either by VPN or by physically appearing on-campus/institution/etc) already. Otherwise, they wouldn't even start doing this at the very beginning. And it doesn't seem to me that they will be satisfied by the first page of the article either. Most likely they just forget to connect their VPN before running their codes.

But my opinion, of course, is biased towards those who have access and may go against those who don't have access but still wish to get the 1st page anyway. In that case, do you think it is worth adding a parameter to fetch the 1st page anyway while warning them on this? Should such an option be available, it ought to be made FALSE as default as well.

sckott commented 4 years ago

thanks for your quick feedback. I can pull out that header and throw a warning. I agree I think it makes most sense to remove the file and fail with error message. I haven't checked, but I imagine it's possible even if one is using their VPN, you could request an article you don't have access to - and so you may get this 1st page problem.

fangzhou-xie commented 4 years ago

Thank you. Maybe this feature is also publish-specific. I notice that I failed on Cambridge links which I don't have access to. Possibly Elsevier is being lenient and offers the first page for some articles in those cases.

sckott commented 4 years ago

This is definitely specific to Elsevier, at least as far as I know. they are the biggest publisher, so it makes sense to make sure their failure cases are handled