open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.27k stars 743 forks source link

Is it possible to get the category of a predicted box when calling MMOCR() #1050

Closed CodePothunter closed 2 years ago

CodePothunter commented 2 years ago

I am looking for a way to show the predicted category of the output from the textdet mode. Currently, the parameter details=True only produces boundary_result.

Is it possible to get the category of a predicted box when calling MMOCR()?

Mountchicken commented 2 years ago

Hi @CodePothunter I'm a little confused by your question. There is only one output category for text detection, and that is text.

CodePothunter commented 2 years ago

Hi @CodePothunter I'm a little confused by your question. There is only one output category for text detection, and that is text.

Thank you for your quick reply. I re-trained the det. model (i.e., FCENet) with my customized dataset of multiple categories. As I observed in the training log, there were loss terms regarding classification accuracy. Therefore, I thought that at least in mmdetection module, it is possible to achieve the predicted classes.

Mountchicken commented 2 years ago

It feels like this will be very complicated. Do you have any idea ?@gaotongxiao

gaotongxiao commented 2 years ago

I don't think we have any loss function concerning classification accuracy. To maintain some compatibility with MMDetection, MMOCR defines only one category "text", which however does not provide any information to text detection models as these models only focus on telling the text region apart from the background. Therefore, text detection models don't really have a classification loss as in MMDetection, except for MaskRCNN, which actually was adapted from MMDetection.

As I observed in the training log, there were loss terms regarding classification accuracy.

I guess what you've seen were the recalls and precisions reported by our hmean-iou metric. These statistics come from the overlaps between predicted & ground truth polygons instead of object categories.

CodePothunter commented 2 years ago

I don't think we have any loss function concerning classification accuracy. To maintain some compatibility with MMDetection, MMOCR defines only one category "text", which however does not provide any information to text detection models as these models only focus on telling the text region apart from the background. Therefore, text detection models don't really have a classification loss as in MMDetection, except for MaskRCNN, which actually was adapted from MMDetection.

As I observed in the training log, there were loss terms regarding classification accuracy.

I guess what you've seen were the recalls and precisions reported by our hmean-iou metric. These statistics come from the overlaps between predicted & ground truth polygons instead of object categories.

I see. I came up with the idea of training each category per mmdet model, which might provide flexible usage under the current design of MMOCR. Would this be a good practice?

Still, the following question is -- what is the best practice to explicitly set up a two-stage OCR with MMOCR? What I mean is, the recognizer processes the outputs from the detector, and produces the results, which also allows me to insert some pre-processing and post-processing operations between the detection and recognition.

My current implementation is as follows. 1) First, parse the output json from the detector (an MMOCR instance), crop, and save the new images to disk. 2) Second, call read_text on the generated images and get the result.

However, I think the I/O operations are somehow wasting time.

Mountchicken commented 2 years ago

You can refer to issue1027, which also has to do some pre-processing on the detection results before sending them to a recognizer.

CodePothunter commented 2 years ago

You can refer to issue1027, which also has to do some pre-processing on the detection results before sending them to a recognizer.

Thank you for your reply. That's what I need.