ocropus-archive / DUP-ocropy

Python-based tools for document analysis and OCR
Apache License 2.0
3.41k stars 590 forks source link

Support for Japanese Language #229

Open jatingarg opened 7 years ago

jatingarg commented 7 years ago

Hello, I really like OCR features of ocropy. I want to use it for document layout analysis on Japanese web pages.Is there support for Japanese language ? Please note that I dont want to know the exact japanese characters but only image segmentation into text / non-textual areas.

Can you also tell me how I should proceed or some reference which I should see ?

Thank You so much

zuphilip commented 7 years ago

There is a model for Japenese characters by @isaomatsunami which you can try: https://github.com/isaomatsunami/clstm-Japanese

A previous discussion about imag/text classification with some links to papers can be found here https://github.com/tmbdev/ocropy/issues/38

jatingarg commented 7 years ago

Thanks zuphilip!