ocropus-archive / DUP-ocropy

Python-based tools for document analysis and OCR
Apache License 2.0
3.41k stars 590 forks source link

Can't find documentation #153

Closed PedroBarcha closed 7 years ago

PedroBarcha commented 7 years ago

Hi there, I've been studying ocropus for a while now, but I couldn't find it's thresholding, denoising and character recognition (not CLSTM) documentation. Does anyone know where to find it? Thanks a lot.

zuphilip commented 7 years ago

This highly overlaps with the issue https://github.com/tmbdev/ocropy/issues/136.

PedroBarcha commented 7 years ago

@zuphilip I had already seen the wiki, as suggested in #136 . But it didn't help me at all. I was hoping to find at least the references to the algorithms, as you did in #118, when you mentioned the paper on layout analysis, that was extremely helpful. Do you know the articles on the matters I'm interested? Or at least any reference at all to the thresh/denoise/calssifier algorithms?

zuphilip commented 7 years ago

You can try to look at the publications wiki page: https://github.com/tmbdev/ocropy/wiki/Publications

amitdo commented 7 years ago

Thresholding - See Binarization Denoising - I think it just removes small connected components. Character recognition - Bidi-LSTM + CTC alignment. see the ICDAR 2013 LSTM Tutorial