Closed fangzhou-xie closed 4 years ago
if you go to the article page and try to get he pdf it is somehow malformed. i dont know if its just this article or many in this journal or publisher.
So I don't think there's much we can do there - though we should fail better and remove the bad pdf file as it does stick around after the read failure
added some more error handling for this case, try to detect malformed pdfs now - not sure how robust the solution is until we run into more cases of malformed pdfs. the behavior now with the latest commit:
doi <- "10.1017/s0081305200012255"
link <- crm_links(doi)
crm_text(link, type="pdf", overwrite_unspecified = TRUE)
#> Error: malformed pdf detected; contact publisher, see if they can fix
Session Info
```r R version 3.6.3 (2020-02-29) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.4 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] crminer_0.3.3.93 loaded via a namespace (and not attached): [1] hoardr_0.5.2 compiler_3.6.3 R6_2.4.1 tools_3.6.3 httpcode_0.3.0 curl_4.3 [7] rappdirs_0.3.1 Rcpp_1.0.4.6 urltools_1.7.3 pdftools_2.3 triebeard_0.3.0 crul_0.9.0 [13] qpdf_1.1 jsonlite_1.6.1 digest_0.6.25 askpass_1.1 ```I think this is connected to #41 , #40 ?