运用keras，tensorflow实现自然场景文字检测，ctc 实现不定长中文OCR识别

https://github.com/tesseract-ocr/tessdata/ These language data files only work with Tesseract 4. They are based on the sources in tesseract-ocr/langdata on GitHub.

Get language data files for Tesseract 3.04 or 3.05 from the 3.04 tree.

More information and a complete list of all languages is available in the Tesseract wiki.

xiaomaxiao commented 6 years ago

Langdata has not been updated for 4.0

You can use current files for finetuning, not for training from scratch.

wanghaisheng commented 6 years ago

@xiaomaxiao chinese-ocr 效果怎么样

xiaomaxiao commented 6 years ago

@wanghaisheng CTPN 泛化很强，大部分都能detect ，但是针对扫描文档重新训练会更好。 CRNN的部分是比较耗时。

你有测试EAST TEXTBOX这些么？

wanghaisheng commented 6 years ago

@xiaomaxiao 扫码文档我们现在自己做了切行我想问的是识别效果怎么样和tesseract比呢

xiaomaxiao commented 6 years ago

@wanghaisheng CRNN 比 TESSERACT好。

你是怎么做的切行？可否分享下。

wanghaisheng commented 6 years ago

@xiaomaxiao 暂时切行不方便分享~~