ocropus-archive / DUP-ocropy

Python-based tools for document analysis and OCR
Apache License 2.0
3.41k stars 590 forks source link

Bringing back ocropus-lattices #186

Open lunactic opened 7 years ago

lunactic commented 7 years ago

Is there a plan to bring back some form of implementation of the ocropus-lattices tool?

It would be great to have the possiblity to extract the recognition-lattices to combine them with an additional language model to possibly improve recognition results.

zuphilip commented 7 years ago

I am not aware of that anyone is working on this at the moment. But, I think, Tom outlined how this could be work again in https://github.com/tmbdev/ocropy/pull/25#issuecomment-72075445 . Is this the same issue?

lunactic commented 7 years ago

The problems you face when you only search in Issues and not in PRs :-D

But yes, basically it is the same problem.

If you want to use a language model in cooperation with the recognition output you need to have the recognition-lattices to combine the possibilities of the language model with the possibilities of the recognizer.

It seems this would require quite some work though and I don't know how high up on your todo list something like this would be,

zuphilip commented 7 years ago

The old code is still there https://github.com/tmbdev/ocropy/blob/master/OLD/ocropus-lattices , but I have no idea what would be needed to adapt or if one should write it new from scratch. It sounds nice to have such a lattice recognizer in combination with language modeling, but this is, I am afraid, not on my todo list. I don't understand enough about these neural networks to even dare to start here something ;-) Maybe, this could be suitable for some student work?