Closed Xiangyu-CAS closed 7 years ago
@398766201 hi,I had a trouble when i compiled the caffe with cuDNN5.0, the problem described in https://github.com/tianzhi0549/CTPN/issues/9. Did you have the same problem when you compiled the caffe with cuDNN5.0? Thank you!
Yes, I encounter the same problem. As the author said, cuDNN 5.0 is not compatible , this projects is based on cuDNN 3.0. So I did not use cuDNN, just comment use-cuDNN statement in makefile.config. using without cuDNN will not reduce performance and processing speed, the only cost is much more GPU memory
@Xiangyu-CAS thank you very much for your answer! 😁
@Xiangyu-CAS For the ICDAR15, we used a simple sampling strategy that allows the network to be able to output word-level bounding boxes directly. We increased the number of negative samples collected from the spaces between words. For example, we controlled the ratio of positive samples, negative ones from background and between words as (0.5, 0.4, 0.1) in each batch. This encourages the model to directly output word-level bboxes without further post-processing. In this work, we aim to provide a fundamental solution for text detection. We believe that the performance on ICDAR15 could be improved considerably by using a more powerful approach for word splitting, and enabling our method to handle multi-oriented texts. Thank you:-).
@tianzhi0549 Thank you very much for your reply, it's so kind of you!
@tianzhi0549 Thanks a lot for your work, and your kindly reply! If I read this right, when two or more tilted lines close to each other, the word-splitting style solution may still result bbox containing nearby characters, which will affect the recognition accuracy. Also, for Chinese texts, the lines are always quite long and without any space in-between, thus the bbox will be full of background noise or nearby characters. Is it possible to get tightly bounded bbox in those cases? Thank you!
@crazylyf it is still an open problem to handle these complicated cases perfectly. The method cannot produce bounding polygons and therefore it cannot fit the text line well if the text line is too inclined. If your goal is to detect multi-oriented text, I suggest that you could try the methods that are originally designed for multi-oriented text. Thank you:-).
Get it. Thank you~
@tianzhi0549 Hello! I also faced this problem, very wondering how to sample the space regions between words? Collect them by hand cropping or using some algorithms?
@LearnerInGithub In my implementation, I obtained space region by algorithms. If the two ground truth boxes are approximately in a line (judge by IoU in vertical), and there's no words in the region between them, the region is selected as space region. I implement this algorithm by two naive loops, traverse all the gt boxess.
@Xiangyu-CAS Thank your for your sharing algorithm, it's looks very intuitive, I will try to add it to my CTPN code.
@Xiangyu-CAS A new question comes, how to feed the picked space reegion into the minibatches? I find in the origin Faster-RCNN implementation, it only store the non-background class bbox in gt_boxes, so how can I add my picked space regions into the input mini-batches?
@LearnerInGithub The same way as non-background bbox. First, You get gt box of space region. Second, anchors which overlap with space gt box > 0.5 was labeled as negative anchors. Third, the ratio of space negative anchor is 10% and ratio background anchor is 40%.
@Xiangyu-CAS Aquestion about testing, I test the CTPN pre-trained model on ICDAR2013, however, it only give 0.002 AP, so I am very confusing about this, have you tested your model on ICDAR2013 and give me advices?
@LearnerInGithub In test code provided by CTPN, it resize image to a fixed sized, so does bbox coordinates. revise demo.py in this way, you will obtained right bboxes in orignial size.
im, f=resize_im(im, cfg.SCALE, cfg.MAX_SCALE)
write_result(RESULT_DIR,im_name,text_lines/f)
@Xiangyu-CAS Yes, now the testing result improved, about 20%, but still far from the paper reported 88%, so what need I to do if I want to get the 80%+ testing result by using the pre-trained model?
@LearnerInGithub That‘s the only modificatioin I had done on test code and I got 87.5% directly. I suppose it casuse by bbox coordinates mismatch, may be you should check it carefully
@Xiangyu-CAS I switched to testing by using CTPN test module, and now it works fine, but still have problem with using the test module of py-faster-rcnn, maybe they have different evaluating standard.
@Xiangyu-CAS Now I want to trained the model on ICDAR2015, and I convert the GT of ICDAR2015 from (x1, y1, x2, y2, x3, y3,x4,y4) to (xmin, ymin, W, H), and the test result of the trained model on ICDAR2015 is abnormal low, only about 10%, and I visualize the detection results, found that there many redundant space between the text and detected bbox, so I am wondering how you handle the ICDAR2015's GT to let CTPN could train on it?
@LearnerInGithub. To be honest, my model failed on ICDAR2015 too, only 50%. I got confused by your description "redundant space between the text and detected bbox,". Do you mean the detection bbox detect target text correctly, but failed on accurate localization? You can try to dived GT into tilt bbox sequence. . Space sample is gona be help too. However , the performance still far away from 60%. I think tianzhi owe us a lot of details. Faster RCNN is much more promising than CTPN in ICDAR2015. Few papers had been released to deal with ICDAR2015. "Arbitrary-Oriented Scene Text Detection via Rotation Proposals" "Detecting oriented text in natural images by linking segments" "Deep Direct Regression for Multi-Oriented Scene Text Detection" strongly recomend, it achieved 83% on ICDAR2015 91% on ICDAR2013,which is state-of-the-art, ranked first on competition website.
@Xiangyu-CAS Yes, that's my meaning. I watched the detection results one by one, the detected bbox too large even though they put text region inside.
@Xiangyu-CAS I have downloaded the paper, and roughly look through it, the result really seems good! But I also notice that a team from CASIA called NLPR_CASIA, they got 82.76 % 84.76 % 83.75 %, now it's the No.1. Not make sure whether the paper "Deep Direct Regression for Multi-Oriented Scene Text Detection" is their work...
@LearnerInGithub As I mentioned , you might trained your CTPN model by horizontal bbox sequences, as a result you obtained detecting result in horizontal bbox sequences. BTW, the proposal connection function should be revised to output tilt rectangle.
That paper is the publication of NLPR_CASIA, you can check out the organization.
Dear Tianzhi: I tried you demo, and obtained an exactly same reuslt on ICDAR 2013 Challenge 2 as you submited . It works perpectly ! BTW, OpenCV 3, CUDA 7.5 is compatible for this project. Now I am trying to test the performance on ICDAR 2015 Challenge 4, which is constitute of many tilt and perspective texts, but the boudingbox returned by your method is a rectangle of whole textline, instead of separated words represented by 8 coordinates. Did you submited the rectangle (4 coordinates) of whole textline in Challenge 4 as you did in Challenge 2 ? If not , what kind of adjustment is applied ? The publication did not mentioned any stuff about tilt and perpsective texts , so I got a little confused.
Best Regards