Closed wanghaisheng closed 6 years ago
跑了一下 环境win10 TITANX CTPN 125 ms CRNN 32X800 160ms
@xiaomaxiao 最后tesseract的 lstm 训练数据找到了吗
@wanghaisheng tesseract 没有公布 lstm的训练数据。
https://github.com/tesseract-ocr/tessdata/issues/72
https://github.com/tesseract-ocr/tessdata/ These language data files only work with Tesseract 4. They are based on the sources in tesseract-ocr/langdata on GitHub.
Get language data files for Tesseract 3.04 or 3.05 from the 3.04 tree.
More information and a complete list of all languages is available in the Tesseract wiki.
@wanghaisheng https://github.com/tesseract-ocr/langdata/issues/94
Langdata has not been updated for 4.0
You can use current files for finetuning, not for training from scratch.
@xiaomaxiao chinese-ocr 效果怎么样
@wanghaisheng CTPN 泛化很强,大部分都能detect ,但是 针对扫描文档 重新训练会更好。 CRNN的部分是比较耗时。
你有测试EAST TEXTBOX这些么?
@xiaomaxiao 扫码文档我们现在自己做了切行 我想问的是识别效果怎么样 和tesseract比呢
@wanghaisheng CRNN 比 TESSERACT好。
你是怎么做的切行?可否分享下。
@xiaomaxiao 暂时切行不方便分享~~
https://github.com/chineseocr/chinese-ocr 特别棒 看起来