ropensci-archive / crminer

:no_entry: ARCHIVED :no_entry: Fetch 'Scholary' Full Text from 'Crossref'
Other
17 stars 5 forks source link

Chicago press full text issue #49

Closed fangzhou-xie closed 4 years ago

fangzhou-xie commented 4 years ago
Session Info ```r crminer_0.3.5.91 ```
> library(crminer)
> link <- crm_links("10.1086/250113")
> link
$unspecified
<url> http://www.journals.uchicago.edu/doi/pdf/10.1086/250113

> ft <- crm_text(link, "pdf", overwrite_unspecified = T)
using cached file: /Users/xiefangzhou/Library/Caches/R/crminer/250113.pdf
date created (size, mb): 2020-06-12 22:59:56 (0)
Extracting text from pdf...
Error in poppler_pdf_info(loadfile(pdf), opw, upw) : PDF parsing failure.

Sorry for posting this, as this is clearly similar to #41 here and others, but this time it happens for U Chicago Press. The full-text link can be copied and pasted to a web browser and opened as a PDF file.

sckott commented 4 years ago

thanks! I'll have a look

sckott commented 4 years ago

i can't replicate the error. can you see if that pdf file is a valid pdf? or is it gibberish? non-pdf content in it?

fangzhou-xie commented 4 years ago

Thanks for your reply! I found out that the cached PDF file is zero-byte object and seems to be corrupted. After installing the newest version and removing all the cached files, it seems to work without any issue. I also succeeded with #46 as well.

Thank you so much!

sckott commented 4 years ago

great, glad it works