I don't know the correct steps to OCR my pecha images.

thubtenrigzin / docker-namsel-ocr

A tool for automating Namsel OCR

MIT License

0 stars 2 forks source link

I don't know the correct steps to OCR my pecha images. #1

Closed chiehan1 closed 6 years ago

chiehan1 commented 6 years ago

I created ~/data and ~/data/out. And put my tibetan pecha images in the ~/data/out. After that, I tried these command.

docker run -itd --name namsel -v ~/data:/home/namsel-ocr/data thubtenrigzin/docker-namsel-ocr:latest bash

docker exec namsel ./pecha

But my ocr_output.txt only contained a string of 'OCR text' on the first line. Could someone help me to figure out what's wrong with my steps? Thank you!

thubtenrigzin commented 6 years ago

Hello, You don't need to create the /out directory, just ~/data and put your pecha scan images only in the ~/data directory to let it works. Other thing, all you images have to be in .tif format.

Nota: /out directory will be used only by scantailor after the preprocess

chiehan1 commented 6 years ago

Thank you so much! I can OCR my images now.

But there are some problems, This is my image

I use this command docker exec namsel ./book

Is there anything I can improve? (e.g. scan quality)

thubtenrigzin commented 6 years ago

Please try these options:

improve the quality of your source scan image by cleaning the "pixel dust"
scan your book using high DPI like 600
preprocess your scan images by choosing a threshold value between -40 to 40 (docker exec namsel ./preprocess [threshold value])

chiehan1 commented 6 years ago

Thank you for the suggestion about scan and the setting of threshold value! I have tried the threshold value to -40 or -55, it is better. I will also re-scan my image to get better OCR results. Thanks a lot!

thubtenrigzin commented 6 years ago

Please note that the threshold range value is from -40 to 40 not -55 I hope everything is clear now and I think I can close this conversation on that issue now... All the best to you!