Open billdenney opened 5 years ago
I reproduced using the first code example from the readme.
library("tabulizer")
f <- system.file("examples", "data.pdf", package = "tabulizer")
out1 <- extract_tables(f)
(Mac 10.13, tabulizer 0.2.2, rJava 0.9-11, R 3.6.0, Java 11.0.1)
Getting the same in Linux too
Just got the same warning. Using R version 3.6.0 (2019-04-26) on Mac OS 10.14.6
Has this caused any actual problems for others?
In Travis it causes build failure.
Anyone was able to solve it, I got the same error
Same here
Same issue here
sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.4 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8
[4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8 LC_ADDRESS=de_DE.UTF-8
[10] LC_TELEPHONE=de_DE.UTF-8 LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] janitor_1.2.0 tabulizer_0.2.2 data.table_1.12.6 tidytext_0.2.0 dplyr_0.8.3
[6] stringr_1.4.0 rvest_0.3.4 xml2_1.2.2 selectr_0.4-1 cronR_0.4.0
I have the same problem. I'm wondering if this problem is about the "quality document". In other words, there are documents (pdf's) can use it with Tabulizer. But, others not.
For example, if you download this pdf you can use Tabulizer. However, if you use this one cannot. I don't know why!. I don't believe illegal problems with the document. I think the "quality of information".
If you make a paper in Word or Excel, then export to pdf and try it, you can do it! So, it seems Tabulizer algorithm doesn't work in all pdf documents 🧙♂️
P.S. I ran in RStudio 1.2.5033 an R 3.6.3 (2020-02-29)
@lefcgis, there definitely could be some documents that trigger the issue and some that do not, but it is a Java coding issue and not an issue with a PDF file (as in, the pdf standard is being followed). For more information, see https://stackoverflow.com/questions/50251798/what-is-an-illegal-reflective-access
Vale! So, it's possible that the reason would be Jdk and Jdr packages, because there are prewiew prerequisites to install rJava. Thanks for your answer, @billdenney 🧙♂️
Now it's causing to break my build
For me, this warning only occurs the first time the example code is run in a new R session. Subsequent runs do not show this warning. Is that the same behavior others here are seeing?
The test code I've been using is...
out <- tabulizer::extract_tables(system.file("examples", "data.pdf", package = "tabulizer"))
If so, I'm curious if #125 resolves this issue for you.
Same thing happened to me. Got this error the first time then just an empty list each subsequent run. I can read other pdfs but it fails on one which is a different format.
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tabulizer_0.2.2
loaded via a namespace (and not attached):
[1] tabulizerjars_1.0.1 compiler_4.0.2 tools_4.0.2 rJava_0.9-13
[5] png_0.1-7
@maahutch An error, or a warning? Those are significantly different.
FWIW I'm getting a WARNING from Java (not R), and an empty list, the first time. Subsequently I get an empty list without a warning from Javascript.
It's possible that this particular PDF is image-only and has no underlying text anyway .. ?
FWIW I'm getting a WARNING from Java (not R), and an empty list, the first time. Subsequently I get an empty list without a warning from Javascript.
It's possible that this particular PDF is image-only and has no underlying text anyway .. ?
yes, that would require OCR (i.e., tesseract or paws)
When working with the current version of R and rJava, there is a warning with
extract_table()
indicating:Unfortunately, I cannot share the underlying .pdf file that caused the error.