ropensci-archive / crminer

:no_entry: ARCHIVED :no_entry: Fetch 'Scholary' Full Text from 'Crossref'
Other
17 stars 5 forks source link

crminer generates invalid link for DOIs from Wiley #26

Closed swood-ecology closed 6 years ago

swood-ecology commented 6 years ago
DOI <- c("10.1007/S10531-017-1376-Y","10.1002/ECS2.1309","10.1614/IPSM-D-14-00048.1","10.1890/14-0922.1","10.1093/AOBPLA/PLU081","10.1007/S10530-014-0705-2","10.2111/REM-D-13-00140.1")
links <- sapply(DOI, crminer::crm_links)

Above is a list of DOIs, some of which are from Wiley. crminer will generate links from the Wiley DOIs, but the links labeled as $.pdf are invalid. In some cases crminer generates valid links labeled as unspecified but in some cases it doesn't and I can't figure out enough of a pattern to exploit that usefully.

sckott commented 6 years ago

Thanks for the issue @swood-ecology ! What version of crminer are you using?

Can you give examples of the URLs that are invalid? Is this an example https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1002%2Fecs2.1309 it doesn't resolve for me - though the response for that DOI does include a URL that does work (for me at least):

crminer::crm_links("10.1002/ECS2.1309")[[3]]
swood-ecology commented 6 years ago

0.1.4

swood-ecology commented 6 years ago

The example you gave is a good one. When you do

crminer::crm_links("10.1002/ECS2.1309")[[3]]

you get a url that starts with 'onlinelibrary'.

Those urls from Wiley work. But, most Wiley DOIs don't return those URLs through crminer. For instance,

crminer::crm_links("10.1890/14-0922.1")

only returns a URL that begins with 'api.wiley'. When I copy and paste that URL into my browser it gives me a blank screen.

In my experience, most of the Wiley DOIs that I run through crminer are of the latter type.

sckott commented 6 years ago

Thanks for the example @swood-ecology

There hasn't been much change in that ver of the package on CRAN and the development version here.

That DOI 10.1890/14-0922.1 URL may work but I don't have access to it. Unfortunately, the URL https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1890%2F14-0922.1 doesn't return anything in the body of the response (i.e., nothing returns to the browser window) but it does return a header (try in your shell curl -v 'https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1890%2F14-0922.1') saying that either you need to give a token or if you have given the token that you may not have access to it because your institution doesn't have access to it. phew. 😥

the output of crm_links can be passed to crm_pdf/crm_text/crm_xml/crm_plain - which handles trying to get the article for you. See ?auth for docs on getting a Crossref TDM token that you'll need to get articles through this pkg

sckott commented 6 years ago

any thoughts @swood-ecology ?

swood-ecology commented 6 years ago

I had been struggling with this because my code in my BibScan package depended on crm_links and I just couldn't figure out how to interface with Wiley properly and had my own function to download the pdf that also built in functionality for Elsevier, which also has a weird API. But it looks like I can't do that without getting a token so I'm switching to crm_pdf which does a nice job of handling tokens.

sckott commented 6 years ago

Okay, sounds good.