Closed sckott closed 4 years ago
e.g., when i'm not on my VPN
x = crm_links("10.1053/j.jvca.2013.06.022")
crm_text(x, "pdf", verbose = TRUE)
#> < HTTP/1.1 200 OK
#> < allow: GET
#> < Content-Encoding: gzip
#> < Content-Type: application/pdf
#> < X-ELS-ResourceVersion: default
#> < X-ELS-Status: WARNING - Response limited to first page because requestor not entitled to resource
I can reproduce this problem on my non-VPN machine as well.
Sure. I think warning users on this should be made available.
My opinion is leaning towards the latter, i.e. failing the requests and remove the file with 1st page. I believe most people, if not all, who want to use crminer to get the full text of articles have access (either by VPN or by physically appearing on-campus/institution/etc) already. Otherwise, they wouldn't even start doing this at the very beginning. And it doesn't seem to me that they will be satisfied by the first page of the article either. Most likely they just forget to connect their VPN before running their codes.
But my opinion, of course, is biased towards those who have access and may go against those who don't have access but still wish to get the 1st page anyway. In that case, do you think it is worth adding a parameter to fetch the 1st page anyway while warning them on this? Should such an option be available, it ought to be made FALSE as default as well.
thanks for your quick feedback. I can pull out that header and throw a warning. I agree I think it makes most sense to remove the file and fail with error message. I haven't checked, but I imagine it's possible even if one is using their VPN, you could request an article you don't have access to - and so you may get this 1st page problem.
Thank you. Maybe this feature is also publish-specific. I notice that I failed on Cambridge links which I don't have access to. Possibly Elsevier is being lenient and offers the first page for some articles in those cases.
This is definitely specific to Elsevier, at least as far as I know. they are the biggest publisher, so it makes sense to make sure their failure cases are handled
when you don't have access to a paper, at least for some papers, elsevier gives a 200 response and returns the first page only, and gives header:
any thoughts @mark-fangzhou-xie ?