open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.32k stars 747 forks source link

Meaning of Recognition Scores #1098

Closed Hegelim closed 2 years ago

Hegelim commented 2 years ago

I am using ABINet to do some text recognition on TorchServe, and my question is very similar to https://github.com/open-mmlab/mmocr/issues/1092 However, in my case I modified https://github.com/open-mmlab/mmocr/blob/main/tools/deployment/mmocr_handler.py#L46 to be results = model_inference(self.model, data, batch_mode=True) so the result I got has multiple scores instead of 1:

(open-mmlab) root@d40b21f4becb:/mmocr# curl -X POST http://localhost:8080/predictions/ABINet -T "{images435/forms_f1120_0_1628795799/0.png,images435/forms_f1120_0_1628795799/1.png}"
{
  "text": "us",
  "score": [
    18.370647430419922,
    15.632901191711426
  ]
}{
  "text": "corporation",
  "score": [
    23.138269424438477,
    21.309354782104492,
    21.27399444580078,
    25.794034957885742,
    22.265600204467773,
    22.63265037536621,
    22.026500701904297,
    20.361919403076172,
    19.674522399902344,
    19.15252113342285,
    20.794090270996094
  ]

My questions are:

  1. What exactly do the scores mean here?
  2. What is the range of score?
  3. Why there are multiple scores for a single image?
  4. Why sometimes I have 2 scores but sometimes I have many more scores? (I noticed that the number of scores is not fixed)

I have been trying to looking at source codes but I couldn't find exactly where it's leading this to happen. Any help is appreciated.

xinke-wang commented 2 years ago

The score was not normalized, we'll fix this or if you are interested, feel free to raise a PR. Please also refer to #1092

gaotongxiao commented 2 years ago

These are the unnormalized confidence scores, or raw logits, of each character. During inference, the model generates such a score for every character in the dictionary, and we choose the character with the highest score as the output at each time step. Though the range of the raw scores depends on the model's implementation, we usually use softmax to normalize them into [0, 1] for a better understanding, which was clearly not been done in ABINet and needs to be fixed.

Hegelim commented 2 years ago

Thanks for the reply, this is very helpful. I just raised a PR and tested it, should be working fine now. Also, should there be a score that describes the accuracy of the word instead of each letter? Average sounds a good way to go