ropensci-archive / crminer

:no_entry: ARCHIVED :no_entry: Fetch 'Scholary' Full Text from 'Crossref'
Other
17 stars 5 forks source link

Fix http/https string substitution when crawling full text links from Elsevier #30

Closed njahn82 closed 5 years ago

njahn82 commented 5 years ago

Description

Hi @sckott , while sending some Elsevier DOIs to Crossref using crminer::crm_links, malformed full-text urls starting with httpss instead of https were returned:

crminer::crm_links("10.1016/j.scib.2017.04.011") 
$xml
<url> httpss://api.elsevier.com/content/article/PII:S2095927317301925?httpAccept=text/xml

$plain
<url> httpss://api.elsevier.com/content/article/PII:S2095927317301925?httpAccept=text/plain

This little fix should address the issue.

Related Issue

Example

sckott commented 5 years ago

thanks very much @njahn82

its not clear, did you test that the fix works?

njahn82 commented 5 years ago

Sorry for coding style changes, spaces are now removed.

I also added the test for the URL, which was returned by crm_links() and which throw an error when called by crm_plain(). Guess, the URL string manipulation for Elsevier fulltext links in the function crm_links() function made the URL invalid by adding a second s at the end of https.

codecov-io commented 5 years ago

Codecov Report

Merging #30 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #30   +/-   ##
=======================================
  Coverage   69.09%   69.09%           
=======================================
  Files          10       10           
  Lines         288      288           
=======================================
  Hits          199      199           
  Misses         89       89
Impacted Files Coverage Δ
R/crm_links.R 76.36% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7002f0d...e9c2cc5. Read the comment docs.

sckott commented 5 years ago

thanks! looks good to me