Open IgorMunizS opened 6 years ago
How does your confusions look like currently, i.e. ocropus-econf
? In general it is hard to say what can improve the accuracy. Can you share here 2-3 of your documents?
The confusion in test data is: errors 233 missing 0 total 4894 err 4.761 % errnomiss 4.761 % 28 ÇÆ çã 15 8 S 14 Ä á 13 Æ ã 12 Ë í 11 Ï ó 7 È é 7 0A ÇÃ 5 ÇÔ çõ 5 , . 0.0476093175317
I left my model training all night with portuguese texts and images generated by ocropus-linegen. the training error is decreasing, but the test error is worse than the default model (en version). Last 4 test errors: 0.04298535663675012 0.050070854983467174 0.0547945205479452 0.05550307038261691 It's currently in 19000 iterations. I'll see if I can share some files and comeback here. Thanks for your reply!
Edit: Files: These are good files. For now, I'm trying to get better results with portuguese characters and not worrying about the quality.
Hi, I'm facing 2 problems:
1 - I need to use Ocropy to extract text from documents in Portuguese. So far, I have added the required characters in and I am training (with a previously trained model) the network based on this:
2 - I know about document quality restrictions (300 dpi), but some images that I have are bad scans. I've tried the same images in other APIs (like Google Vision) and got better results, but I liked ocropy. I'm wondering if there are some preprocess techniques that can improve the results.
So, what can I do? What is the best way to generate data for training ocropy network? Edit: ocropy training supports multithreading? Thanks!
Python version: Python 2.7.14 :: Anaconda, Inc.
Git revision of ocropy: commit e9b6121de2637e54495125c6a97a4ef75d872a2e Merge: 43381c4 289a58f Author: Konstantin Baierer Date: Mon Feb 19 19:24:12 2018 +0100
Merge pull request #236 from lehzwo/master
ocropus-gpageseg: Enable usage of masks to specify column separators/ ignore areas of scan
Operating System and version: Linux ubuntu-virtual 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux