Closed Monduiz closed 6 years ago
Yikes, thanks for reporting. Can you please post your sessionInfo()
?
Sure:
R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] tesseract_1.4 pdftools_1.2
loaded via a namespace (and not attached): [1] compiler_3.4.0 tools_3.4.0 curl_2.6 Rcpp_0.12.11 git2r_0.18.0 digest_0.6.12 ghit_0.2.17
I think this should be fixed in the new cran version 1.6.
This seems to be still the case:
text <- ocr("http://jeroen.github.io/files/inlove.png") Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page cat(text) In love
sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] tesseract_1.6 RevoUtilsMath_10.0.0
loaded via a namespace (and not attached):
[1] compiler_3.4.1 RevoUtils_10.0.5 tools_3.4.1 curl_2.6
[5] rappdirs_0.3.1 Rcpp_0.12.12 digest_0.6.12
I'm getting this warning now but the result is correct, right?
Yes, the result seems right.
I'm having this problem as well. The output I got is just a bunch of symbols. Please help! Thanks!
cat(text)
t {'39} « i “ ‘ ’ v» ; :3 K . -
_ .gg‘ _ _ » l, “ . W k
9.; ' D c?
Qigfi —. d ,5; [N -
gage o ‘ _ ‘ ..
‘ a 4 .1 .v . s
E - i; ,7; ,
< ~ .5 ._ 2.
’ 35 f: “n” i 7‘ 5 ;~
—. V. , H ,, ,
_. . ”(,7 , g: sag a H
T = ‘~'i‘ % L195 3 ;,
= A .v = u 1 ~- «
_, u 5 ,_ ~ “ _ z.
3 a f? ??W- o' .
, 3. . , 7 .MWg ‘12.} a _
g 2 n7 ‘ z ' "7‘54 U 5* ‘
"‘ >< m: ‘ -:.:.,‘ ‘ -‘ :1 “
,— 3- z ., w :‘ 1%
,3,» 5 , a .
, °‘ 2 '1 .r‘ a c
‘ =3 fl. ~“i, i‘L‘i ' '
E5 M2“ , , l: ,a '
g1> E r m , ‘
V 21%.; ,V U 3.: , a v: ‘ ’5
I 301 $2 3 "‘12 ..
‘ 811'. 5% ‘ ~ ‘ «Y 7 ‘
, iDEfiJ w “$3; 7»
- \ M“
\ v
Image used:
Session Info
Session info ------------------------------------------
setting value
version R version 3.4.3 (2017-11-30)
system x86_64, mingw32
ui RStudio (1.1.419)
language (EN)
collate English_United States.1252
tz America/Los_Angeles
date 2018-01-25
Packages ----------------------------------------------
package * version date
base * 3.4.3 2017-11-30
compiler 3.4.3 2017-11-30
curl 3.1 2017-12-12
datasets * 3.4.3 2017-11-30
devtools 1.13.4 2017-11-09
digest 0.6.14 2018-01-14
graphics * 3.4.3 2017-11-30
grDevices * 3.4.3 2017-11-30
memoise 1.1.0 2017-04-21
methods * 3.4.3 2017-11-30
pdftools 1.5 2017-11-05
rappdirs 0.3.1 2016-03-28
Rcpp 0.12.15 2018-01-20
rstudioapi * 0.7 2017-09-07
stats * 3.4.3 2017-11-30
tesseract * 1.6 2017-08-14
tools 3.4.3 2017-11-30
utils * 3.4.3 2017-11-30
withr 2.1.1.9000 2018-01-03
yaml 2.1.16 2017-12-12
I think this message can be ignored. It seems to appear sometimes when tesseract only finds very few or no text in an image.
When I try the example from the ropensci page, it gives out an error. Is this intended? Previously, it was possible to recognize text with less than 50 characters.