Closed NikhilKhuje2797 closed 5 years ago
Thanks for your interest. Remove file CPair to remove auto-corrections. Auto-corrections (and CPair) depend on ocr system and domain. I have no idea about CLA, if you have any issues related to qt and its gui, i can try to help.
sir if we load one text file in gui, then if we load the next page to check by clicking on " + " , it doesnt work.
As per your test cases , when we load text file the corresponding image gets automatically loaded, but it doesnt happens with my data.
The file names for the text file and image should be same, also it should follow the syntax: "page-i.txt" and "page-i.jpeg", where i goes from 1 to no_of_pages.
one
you can move from page-1.txt to page-2.txt by clicking on "Page(CtrlShftR)>>". This will also change the image from page-1.jpeg to page-2.jpeg. use "Open" (Right to +) to load only the first file. Do not use +, it will load only text file.
Hello sir, I have a doubt regarding color marking of system. As per the documentation and your test cases , colours are marked only to the words which are wrong spelled and correct words are in normal colour.But when it comes for my data checking even correct words are shown in color marked and so i cannot able to distingush between wrong and correct word by observing colors. Documentation says correct folder contains correct pages , so this folder contains manually corrected samples? How many things are necessary if i want to run the same process and to expect the same results on my data , as showing in yours.
Please guide , Thank You
which 2 OCR systems do you use? Quality of Colour coding depends on the quality of difference in models and training data of two OCR systems. The more different they are, the better would be the quality.
The samples were corrected using our software. For demo, we cannot keep them in folder "Corrected". So we just shifted them from "Corrected" to "Correct".
All the things are given in Readme. Read them carefully. I agree that it's tedious, but once understood it saves a lot of time.
so sir you have used Indsenz and Google Doc OCR outputs for quality difference?
yes, for Sanskrit. what OCR systems you are using and what language you are working on?
N for hindi?
On Tue 18 Dec, 2018, 3:57 PM rohitsaluja22, notifications@github.com wrote:
yes, for Sanskrit. what OCR systems you are using?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/rohitsaluja22/OpenOCRCorrect/issues/4#issuecomment-448172447, or mute the thread https://github.com/notifications/unsubscribe-auth/Aei5s5KHNE4zcnzSpYt2Mkk2L97dxqjRks5u6ML6gaJpZM4ZUmBL .
M using tesseract ocr
On Tue 18 Dec, 2018, 3:57 PM rohitsaluja22, notifications@github.com wrote:
yes, for Sanskrit. what OCR systems you are using?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/rohitsaluja22/OpenOCRCorrect/issues/4#issuecomment-448172447, or mute the thread https://github.com/notifications/unsubscribe-auth/Aei5s5KHNE4zcnzSpYt2Mkk2L97dxqjRks5u6ML6gaJpZM4ZUmBL .
so for better quality of spell checking , its mandatory for me to use two different ocr's for better spellchecking. I am only working work hindi language. So the folder Book3hindi contains outputs from two different ocr's ?
i am using Tesseract OCR and Google doc OCR
Sir I have combined my dictionary with your and took sample converted pages by Google Doc and Tesseract OCR and loaded in the system, same issue of not showing color to wrongs words is happening , I have also created IEOCR and GEOCR folders of data. Please guide. Thank you
Yes, you should try Indsenz and Tesseract, or Indsenz and Google Doc. Tesseract and Google Doc are both from Google, probably that is the reason you are not getting good results.
Or send me your folder structure via mail. I can check if something else is wrong.
Indsenz shows only premium version, Which is not affordable for me, Can you suggest some another OCR in combination with TESSERACT. ThankYou
sir i have correct word in file like पडे़ , लडे़ , पडे,लडे but when i click spellcheck button , they automatically becomes पड़ए,लड़ए. Even though my dict doesnt contain these words ( पड़ए,लड़ए.). what should i do to correct it.? Thankyou
Sir, It tool is working well now, I have setted my data according to thee standard names. Thankyou.
Cool.. all the best. Please reply which OCR engines you are using and then close the issue.
Tesseract-OCR and Google DOC OCR. Thank You
hello sir, While using your test data of image and ocr text, after loading text for spell check the system perfomes well ,(for eg. misspelled words are represented with colour).
But when it comes for my test case, it automatically gets auto corrected without suggesting wrong words with colour , and in that auto correction process some right words are also getting auto wrong. Sir , could we operate system with CLA instead of GUI? Thank you.