wanghaisheng / awesome-ocr

A curated list of promising OCR resources
http://wanghaisheng.github.io/ocr-arxiv-daily/
MIT License
1.66k stars 351 forks source link

ocropus 二值化的测试 #56

Closed wanghaisheng closed 6 years ago

wanghaisheng commented 7 years ago

参考 github code Optimizing Binarization for OCRopus

wanghaisheng commented 7 years ago

http://scantailor.org/ https://github.com/scantailor/scantailor

wanghaisheng commented 7 years ago

ocropus-gpageseg -n –csminheight 100000 –usegauss ––gray left.bin.png right.bin.png

wanghaisheng commented 7 years ago
root@fa990931144e:/ocropy# ocropus-nlbin -e -10.0 -g -z 5.0 -n pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf.jpg  -o pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01  
INFO:  # pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf.jpg
INFO:  === pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf.jpg 1  
INFO:  flattening
INFO:  estimating skew angle
INFO:  estimating thresholds
INFO:  rescaling
INFO:  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf.jpg lo-hi (0.86 1.11) angle  1.0 
INFO:  writing
root@fa990931144e:/ocropy# ocropus-gpageseg --csminheight 100000 --usegauss --gray  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png 
INFO:  
INFO:  ########## /usr/local/bin/ocropus-gpageseg --csminheight 100000 --usega
INFO:  
INFO:  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png
ERROR:  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png SKIPPED too many connnected components for a page image (4132 > 833) (use -n to disable this check)
root@fa990931144e:/ocropy# ocropus-gpageseg --csminheight 100000 --usegauss   pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png 
INFO:  
INFO:  ########## /usr/local/bin/ocropus-gpageseg --csminheight 100000 --usega
INFO:  
INFO:  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png
ERROR:  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png SKIPPED too many connnected components for a page image (4132 > 833) (use -n to disable this check)
root@fa990931144e:/ocropy# ocropus-gpageseg --csminheight 100000 --usegauss -n  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png 
INFO:  
INFO:  ########## /usr/local/bin/ocropus-gpageseg --csminheight 100000 --usega
INFO:  
INFO:  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png
INFO:  scale 16.492423
INFO:  computing segmentation
INFO:  computing column separators
INFO:  considering at most 3 whitespace column separators
INFO:  computing lines
INFO:  propagating labels
INFO:  spreading labels
INFO:  number of lines 21
INFO:  finding reading order
INFO:  writing lines
INFO:      20  pic/lab/e2044f07jw1f7c7gx4fuoj20rs0kutgf-01/0001.bin.png 16.5 21
root@fa990931144e:/ocropy# 
wanghaisheng commented 7 years ago
vips pdfload pdf/1.pdf  --dpi  300 pdf/1.default.jpg

➜  OCRopus git:(master) ✗ curl -F "image=@01.jpg" -F "threshold=0.5" -o 01.bin.png   http://localhost:8001/binarizationapi                                                         

➜  OCRopus git:(master) ✗ curl -F "image=@01.bin.png" -F "threshold=0.5" -o 01.zip   http://localhost:8002/segmentationapi

➜  OCRopus git:(master) ✗ curl -F "image=@01/01.bin_31.png" -F "probabilities=True" -o 01ocr.zip  http://localhost:8003/recognitionapi
wanghaisheng commented 7 years ago

https://github.com/danvk/oldnyc/tree/master/ocr

wanghaisheng commented 7 years ago

https://github.com/acislab/HuMaIN_Microservices

wanghaisheng commented 7 years ago

https://comsys.informatik.uni-kiel.de/lang/de/res/ocropus/

wanghaisheng commented 7 years ago

https://hdw.artsci.wustl.edu/articles/154

wanghaisheng commented 7 years ago

http://graal.hypotheses.org/786

wanghaisheng commented 7 years ago

https://arxiv.org/ftp/arxiv/papers/1701/1701.07395.pdf

wanghaisheng commented 7 years ago

http://www.digitalhumanities.org/dhq/vol/11/2/000288/000288.html

wanghaisheng commented 7 years ago

http://sirwasi.bplaced.net/texterkennung-ocr-von-fruhneuzeitlichen-fraktur-drucken/

wanghaisheng commented 7 years ago

docker@966fe1e78237:~/pic/xuetang$ ocropus-nlbin -e -10000.0 -g -z 5.0 -n IMG_20170922_142015.jpg