ropensci / tesseract

Bindings to Tesseract OCR engine for R
https://docs.ropensci.org/tesseract
244 stars 26 forks source link

Too few characters #15

Closed Monduiz closed 6 years ago

Monduiz commented 7 years ago

When I try the example from the ropensci page, it gives out an error. Is this intended? Previously, it was possible to recognize text with less than 50 characters.

text <- ocr("http://jeroen.github.io/files/inlove.png")
cat(text) 
# Too few characters. Skipping this page
jeroen commented 7 years ago

Yikes, thanks for reporting. Can you please post your sessionInfo() ?

Monduiz commented 7 years ago

Sure:

R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tesseract_1.4 pdftools_1.2

loaded via a namespace (and not attached): [1] compiler_3.4.0 tools_3.4.0 curl_2.6 Rcpp_0.12.11 git2r_0.18.0 digest_0.6.12 ghit_0.2.17

jeroen commented 7 years ago

I think this should be fixed in the new cran version 1.6.

sandeep-n commented 7 years ago

This seems to be still the case:

text <- ocr("http://jeroen.github.io/files/inlove.png") Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page cat(text) In love

sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tesseract_1.6 RevoUtilsMath_10.0.0

loaded via a namespace (and not attached): [1] compiler_3.4.1 RevoUtils_10.0.5 tools_3.4.1 curl_2.6
[5] rappdirs_0.3.1 Rcpp_0.12.12 digest_0.6.12

jeroen commented 7 years ago

I'm getting this warning now but the result is correct, right?

sandeep-n commented 7 years ago

Yes, the result seems right.

tungttnguyen commented 6 years ago

I'm having this problem as well. The output I got is just a bunch of symbols. Please help! Thanks!

cat(text)

t {'39} « i “ ‘ ’ v» ; :3 K . -
_ .gg‘ _ _ » l, “ . W k
9.; ' D c?
Qigfi —. d ,5; [N -
gage o ‘ _ ‘ ..
‘ a 4 .1 .v . s
E - i; ,7; ,
< ~ .5 ._ 2.
’ 35 f: “n” i 7‘ 5 ;~
—. V. , H ,, ,
_. . ”(,7 , g: sag a H
T = ‘~'i‘ % L195 3 ;,
= A .v = u 1 ~- «
_, u 5 ,_ ~ “ _ z.
3 a f? ??W- o' .
, 3. . , 7 .MWg ‘12.} a _
g 2 n7 ‘ z ' "7‘54 U 5* ‘
"‘ >< m: ‘ -:.:.,‘ ‘ -‘ :1 “
,— 3- z ., w :‘ 1%
,3,» 5 , a .
, °‘ 2 '1 .r‘ a c
‘ =3 fl. ~“i, i‘L‘i ' '
E5 M2“ , , l: ,a '
g1> E r m , ‘
V 21%.; ,V U 3.: , a v: ‘ ’5
I 301 $2 3 "‘12 ..
‘ 811'. 5% ‘ ~ ‘ «Y 7 ‘
, iDEfiJ w “$3; 7»
- \ M“
\ v

Image used:

20180125_145519

Session Info

Session info ------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, mingw32             
 ui       RStudio (1.1.419)           
 language (EN)                        
 collate  English_United States.1252  
 tz       America/Los_Angeles         
 date     2018-01-25                  

Packages ----------------------------------------------
 package    * version    date      
 base       * 3.4.3      2017-11-30
 compiler     3.4.3      2017-11-30
 curl         3.1        2017-12-12
 datasets   * 3.4.3      2017-11-30
 devtools     1.13.4     2017-11-09
 digest       0.6.14     2018-01-14
 graphics   * 3.4.3      2017-11-30
 grDevices  * 3.4.3      2017-11-30
 memoise      1.1.0      2017-04-21
 methods    * 3.4.3      2017-11-30
 pdftools     1.5        2017-11-05
 rappdirs     0.3.1      2016-03-28
 Rcpp         0.12.15    2018-01-20
 rstudioapi * 0.7        2017-09-07
 stats      * 3.4.3      2017-11-30
 tesseract  * 1.6        2017-08-14
 tools        3.4.3      2017-11-30
 utils      * 3.4.3      2017-11-30
 withr        2.1.1.9000 2018-01-03
 yaml         2.1.16     2017-12-12
jeroen commented 6 years ago

I think this message can be ignored. It seems to appear sometimes when tesseract only finds very few or no text in an image.