Open hoangtocdo90 opened 7 years ago
i got a same problem. i am using jpn.traindata. i tried RIL_SYMBOL, RIL_WORD. RIL_SYMBOL is better. A critical problem is ---- the recgnized character is so good but the position is too bad. i need the pair of image and character, don't you? If you have new information pls tell me.
thanks
I'm temple fix this by using this way I'm using RIL_SYMBOL. in my case the wrong Coordinate usually appear in a end of lines or end of block res_it->IsAtFinalElement(RIL_TEXTLINE, RIL_WORD) res_it->IsAtFinalElement(RIL_PARA, RIL_WORD) res_it->IsAtFinalElement(RIL_BLOCK, RIL_WORD) when you get a wrong Coordinate you can predict a new coordinate by using the backforward of ResultIterator coordinate
Char value = こ left= 15 top = 14 right = 51 bottom = 51 conf = 99 Char value = ん left= 64 top = 9 right = 112 bottom = 54 conf = 99 Char value = ば left= 122 top = 5 right = 171 bottom = 54 conf = 99 Char value = ん left= 176 top = 9 right = 224 bottom = 54 conf = 99 Char value = は left= 234 top = 9 right = 281 bottom = 54 conf = 99 Char value = こ left= 295 top = 14 right = 331 bottom = 51 conf = 99 Char value = ん left= 344 top = 9 right = 392 bottom = 54 conf = 99 Char value = ば left= 402 top = 5 right = 445 bottom = 54 conf = 99 Char value = ん left= 456 top = 9 right = 497 bottom = 54 conf = 99 Char value = は left= 514 top = 9 right = 561 bottom = 54 conf = 99 Char value = ご left= 15 top = 79 right = 58 bottom = 126 conf = 99 Char value = 飯 left= 62 top = 80 right = 113 bottom = 130 conf = 99 Char value = 大 left= 120 top = 80 right = 225 bottom = 130 conf = 99 Char value = 盛 left= 242 top = 83 right = 260 bottom = 130 conf = 99 Char value = り left= 2328 top = 1616 right = 2328 bottom = 1616 conf = 99 Char value = 。 left= 289 top = 116 right = 305 bottom = 131 conf = 99
Strange. It looks like a bug.
Please check if this is fixed by the latest set of commits by Ray.
level page_num block_num par_num line_num word_num left top width height conf text
1 1 0 0 0 0 0 0 2550 470 -1
2 1 1 0 0 0 104 96 2286 348 -1
3 1 1 1 0 0 111 96 546 49 -1
4 1 1 1 1 0 111 96 546 49 -1
5 1 1 1 1 1 111 105 36 37 96 こ
5 1 1 1 1 2 160 100 48 45 96 ん
5 1 1 1 1 3 218 96 49 49 96 ば
5 1 1 1 1 4 272 100 48 45 96 ん
5 1 1 1 1 5 330 100 47 45 96 は
5 1 1 1 1 6 391 105 36 37 95 こ
5 1 1 1 1 7 440 100 48 45 96 ん
5 1 1 1 1 8 498 96 49 49 96 ば
5 1 1 1 1 9 552 100 48 45 96 ん
5 1 1 1 1 10 610 100 47 45 95 は
3 1 1 2 0 0 111 170 962 52 -1
4 1 1 2 1 0 111 170 962 52 -1
5 1 1 2 1 1 111 170 43 47 96 ご
5 1 1 2 1 2 158 171 107 50 95 飯
5 1 1 2 1 3 271 171 50 50 96 大
5 1 1 2 1 4 338 174 29 47 96 盛
5 1 1 2 1 5 0 0 2550 470 96 り
5 1 1 2 1 6 385 207 16 15 96 。
5 1 1 2 1 7 439 172 123 50 93 今
5 1 1 2 1 8 0 0 2550 470 95 年
5 1 1 2 1 9 567 171 65 51 96 は
5 1 1 2 1 10 624 173 87 48 96 初
5 1 1 2 1 11 0 0 2550 470 95 め
5 1 1 2 1 12 722 178 43 41 96 て
5 1 1 2 1 13 776 171 48 50 94 恋
5 1 1 2 1 14 832 173 101 48 96 人
5 1 1 2 1 15 944 171 48 50 96 出
5 1 1 2 1 16 1001 173 26 46 96 来
5 1 1 2 1 17 1021 191 25 28 96 た
5 1 1 2 1 18 1057 207 16 15 96 。
3 1 1 3 0 0 106 245 2284 126 -1
4 1 1 3 1 0 106 245 2284 51 -1
Thank sir
5 1 1 2 1 1 111 170 43 47 96 ご 5 1 1 2 1 2 158 171 107 50 95 飯 5 1 1 2 1 3 271 171 50 50 96 大 5 1 1 2 1 4 338 174 29 47 96 盛 5 1 1 2 1 5 0 0 2550 470 96 り 2 1 10 624 173 87 48 96 初 5 1 1 2 1 11 0 0 2550 470 95 め 5 1 1 2 1 12 722 178 43 41 96 て
Please check this . Still wrong coordinate in り and め character
We are still able to reproduce it in the Arabic language in LSTM mode. Most BBoxes are correct but there are some boxes that contain valid text and wrong coordinates (the region contained in the bbox is empty).
I'm getting the same behavior for Thai language in LSTM - BoundingBox() often returns the whole image size. The image size was 400, 266. Here is a small portion of some results [X1, Y1; X2, Y2]. (As a side note, I'm using RIL_WORD, but it seems to behave like RIL_SYMBOL, I'm not sure why).
'ร' - Confidence: 94.3645 [0, 0; 400, 266] 'ม' - Confidence: 95.7061 [19, 68; 33, 77] 'า' - Confidence: 96.9703 [0, 0; 400, 266] 'ส' - Confidence: 96.976 [35, 67; 50, 77]
@amitdo sir could you show me where to get more info about how tesseract analyze input image to get the Coordinate of words/character and then recognize them through LSTM or old method and last combine the ocr result word with the coordinate ?
@wanghaisheng
See here: https://github.com/tesseract-ocr/tesseract/blob/master/lstm/recodebeam.cpp Search for 'box', 'xcoords', 'blob'
This happened to me also in Arabic language. Here is an example that reproduce the problem.
struct OcrResult
{
std::string text;
cv::Rect box;
};
int main(int argc, char *argv[])
{
tesseract::TessBaseAPI tesseract ;
tesseract.Init("./data/tessdata/", "ara", tesseract::OcrEngineMode::OEM_LSTM_ONLY);
tesseract.SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_WORD);
PIX *patch_pix = pixRead(argv[1]);
tesseract.SetImage(patch_pix);
tesseract.Recognize(0);
std::vector<OcrResult> ocrResults;
tesseract::ResultIterator *ri = tesseract.GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD ;
if (ri != 0)
{
do
{
char *word = ri->GetUTF8Text(level);
int left, top, right, bottom;
if (ri->BoundingBox(level, &left, &top, &right, &bottom))
{
OcrResult res;
res.box = cv::Rect(left,top,right - left,bottom - top);
res.text = std::string(word);
ocrResults.push_back(res);
}
delete[] word;
} while (ri->Next(level));
}
cv::Mat image = cv::imread(argv[1]);
cv::Mat DrawingImg = image.clone();
for(int i=0;i<ocrResults.size();i++){
cv::Rect rect = ocrResults[i].box;
cv::rectangle(DrawingImg, rect, cv::Scalar(255, 0, 0), 1);
std::cout<<ocrResults[i].text<<std::endl;
cv::imshow("DrawingImg",DrawingImg);
cv::waitKey();
}
}
I used OpenCV to draw boxes. This image contains two arabic words. The recognition is correct for both words. But the box position of the first word is wrong. (the word in the right) The box is matching some noise on the top of the image.
And this is the original image.
I'm also getting this bug for english text, though I can't provide the data files as they contain PII.
I got same issue on beta.4 with jpn.traineddata.
In my case, the image size(width, height) and the invalid coordinate value are correlated. Even with the same letter, It's cause incorrect results depending on the position in the image.
$ tesseract -l jpn 'images/sample/test-jpn_01.jpg' stdout tsv | grep 596
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 167
1 1 0 0 0 0 0 0 596 118 -1
5 1 1 1 1 10 0 0 596 118 92 、
5 1 1 1 2 4 0 0 596 118 92 字
5 1 1 1 2 10 0 0 596 118 97 て
5 1 1 1 2 15 0 0 596 118 93 す
My test image size is 596x118. The same letter appears multiple times(ex. '字', 'て'), but the value of boundigbox is wrong only once.
FYI, In the above image, recognition of the character '日' incorrect by jpn.traineddata( traineddata_fast).
Same issue as #1192
Hi all i'm using Tesseract for get each char with Coordinate in image . I'm using ResultIterator with OCR MODE =2 (LTSM) and language = jpn.
Here is my program log and input image . You can see in り character i got wrong Coordinate . I tested using tsv and hocr but it's give me same result. Still wrong Coordinate .
And one more question . I'm try to and fonts in jpn data but may be i must re train from scratch. But i don't know actrually my jpn tessseract data (i'm downloaded from tessdata repository) how to make this? I'm try download data from langdata repository make image from jpn.traintext and train it by using tesstrain.sh and Jtessboxeditor . But i got low accurary than i download from repository. Some body can tell me extractly how to make it! Sorry for my bad english