Closed PhillippOhlandt closed 7 years ago
Tesseract works best on 300 dpi black and white images.
Preprocess your image, you will get better result.
See attached.
Here is the OCRed result using Tesseract Open Source OCR Engine v4.00.00dev-549-g2b854e3 with Leptonica
13
14 ## Parameters
15
16 _- 'user' - A User struct.
17
18 - ## Examples
19
20 user = %User{name: "Alice Winston" }
21 User. first_name(user)
22 "Alice"
23 serts
24 - def first_name(user) do
25 user
26 [> Bplit
27 [> first
28 end
29
30 @doc nnn
31 Get the last name of a user.
32
33 ## Parameters
34
35 - 'user' - A User struct.
user. exs unix < utf-8 < elixir _ 50%
Yeah, that's a lot better. Thank you!
@Shreeshrii What have you used to preprocess the image? I tried it with imagemagick but the result isn't quite good and therefore the OCR result too.
I used irfanview interactively to resize the image to a larger size, increase dpi to 300, convert to grayscale, reduce color depth to 2 and invert the colors.
You should be able to do similar conversion using imagemagick.
@Shreeshrii The best I can get is this:
Using this command:
convert -units PixelsPerInch input.png -resize 1200 -density 300 -colorspace gray -depth 1 -negate output.png
Not sure if imagemagick can do better.
I had resized to 1920 x 1080 and 300 dpi.
You have to experiment with the settings and see what works best. See https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality for more tips.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Jun 20, 2017 at 1:04 PM, Phillipp Ohlandt notifications@github.com wrote:
@Shreeshrii https://github.com/shreeshrii The best I can get is this:
[image: frame_24721-2] https://user-images.githubusercontent.com/3123549/27321677-64de42d2-559b-11e7-8d3a-f405b18d5fbf.png
Using this command:
convert -units PixelsPerInch input.png -resize 1200 -density 300 -colorspace gray -depth 1 -negate output.png
Not sure if imagemagick can do better.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/997#issuecomment-309669745, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o-0wgvHJPxFAHCJURDfSBTRHEUMzks5sF3XzgaJpZM4N-Jpf .
Environment
Current Behavior:
When I give it the following image, the text detection is really bad.
It outputs the following:
Expected Behavior:
It outputs the correct text.
Suggested Fix:
I don't know how Tesseract works in its core but the image contains clear and readable characters, even in English.