Not getting accuracy on MMRecognition SAR Model Training

payal211 commented 2 years ago

HI @gaotongxiao

I am training Custom dataset for Text recognition using SAR Model. I have total 7K plus images for training. Can you please help me how long should I have to wait for trained model. As of now it completed 65th epoch and the Accuracy matric at 65th epoch is as below:

2022-03-21 08:06:45,433 - mmocr - INFO - Epoch(val) [65][100] 0_word_acc: 0.0000, 0_word_acc_ignore_case: 0.0000, 0_word_acc_ignore_case_symbol: 0.0000, 0_char_recall: 0.1346, 0_char_precision: 0.1089, 0_1-N.E.D: 0.0776

As you can see precision and recall are very less.

Also can you please suggest any preprocessing technique which you are aware to achieve good accuracy with respect to text recognition task?

Here is the attached SS of training continuation: Training_66th_epoch

Mountchicken commented 2 years ago

Hi @payal211 , can you share your config file?

gaotongxiao commented 2 years ago

According to your log, it still needs 18 days to complete training. The accuracy is still very low and it probably has something to do with your hyperparameter configuration. FYI, we trained SAR on 48 GPUs, and you might scale down the learning rate accordingly. We have also provided the detailed log for reference https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json.

As for data augmentation techniques, ABINet's pipeline is empirically effective in boosting the model's final performance. But using them won't necessarily reduce the convergence time.

payal211 commented 2 years ago

Hi @payal211 , can you share your config file?

Hi @Mountchicken

Here is attached config file, sar_r31_parallel_decoder_custom_dataset.txt

Thanks

payal211 commented 2 years ago

According to your log, it still needs 18 days to complete training. The accuracy is still very low and it probably has something to do with your hyperparameter configuration. FYI, we trained SAR on 48 GPUs, and you might scale down the learning rate accordingly. We have also provided the detailed log for reference https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json.

As for data augmentation techniques, ABINet's pipeline is empirically effective in boosting the model's final performance. But using them won't necessarily reduce the convergence time.

Hi @gaotongxiao,

Thanks for the quick response and suggestion. Can you please more elaborate what should be the hyperparameters? I am training this model for 37 characters. (0-9 digits and A-Z alphabets) with 600 epochs. Is it right or should I have to train it for more time? If ABINET is good choice for this task then I can try that too. Thanks for your help.

Mountchicken commented 2 years ago

Hi @payal211 You should try a larger batch size for faster training. Using 8 samples in a batch only occupy 2g gpu memory in your case. Try 32 or 64 if possible. BTW, can you show me the config file generated when training start? It contains full information, and it should be located at ./work_dirs or somewhere.

payal211 commented 2 years ago

Hi @Mountchicken, Thanks for point out me. Here I am attaching the config file generated when training start. sar_r31_parallel_decoder_custom_dataset.txt

Mountchicken commented 2 years ago

Hi @payal211 What does your data look like? BTW I see that you are using DICT90 and if you are predicting characters from 0~9 and A~Z, remember to modify it.

Mountchicken commented 2 years ago

And there is a hiding bug in SAR. It can't recognize the number 0 and if every test image has 0 in its label, the accuracy will always be 0.00%

payal211 commented 2 years ago

Hi @Mountchicken Can you please correct me where I have to modify As per my Knowledge, I modified these 2 files:

.\mmocr-main\mmocr\models\textrecog\convertors\base.py line 22 : DICT36 = tuple('0123456789abcdefghijklmnopqrstuvwxyz') to DICT36 = tuple('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')
.\mmocr-main\configs_base_\recog_models\sar.py line 2 : type='AttnConvertor', dict_type='DICT90', with_unknown=True) to type='AttnConvertor', dict_type='DICT36', with_unknown=True)

and any suggestion how to overcome this hiding bug in SAR?

Thanks.

Mountchicken commented 2 years ago

Sorry for the late reply. It's ok to modify the dictionary in this way. Here is the bug: When calculating CrossEntropy Loss in SAR, we set the ignore_index to 0 as the default. The ignore_index should point to the <PAD> token in the dictionary for attention mechanism. However, the bug is that when we are building the dictionary, the <PAD> token is not at the position of index 0, instead, the end of the dictionary. check here. So the model has to correctly predict the <PAD> token to minimize the CE loss which is unnecessary and may cause the model hard to converge in your situation. Here is a method to quickly fix the bug by moving the <PAD> token to the front of the dictionary: modify this two line in this way:

#self.idx2char.append(padding_token)
self.idx2char.insert(0, padding_token)
#self.padding_idx = len(self.idx2char) - 1 
self.padding_idx = 0

payal211 commented 2 years ago

Thank you so much @Mountchicken. I will start Training and will update you for result.

payal211 commented 2 years ago

Hi @Mountchicken & @gaotongxiao I started training on 22nd March 2022, after 15th epoch Accuracy is

[23-03-2022 09:33] mmocr - INFO - Epoch(val) [15][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9500, 0_1-N.E.D: 0.9679

Then after 21st Epoch Accuracy is: 2022-03-23 12:04:30,316 - mmocr - INFO - Epoch(val) [21][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9800, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679

Then After 45th Epoch Accuracy is: 2022-03-25 03:38:54,303 - mmocr - INFO - Epoch(val) [45][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9800, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9680

Then After 91th Epoch Accuracy is:

2022-03-28 06:06:50,483 - mmocr - INFO - Epoch(val) [91][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679

and after 99th epoch still there's no more difference in Accuracy

2022-03-28 16:55:05,712 - mmocr - INFO - Epoch(val) [99][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679

so is there Anything I am missing here? As precision and recall is pretty much good but still not able to recognize text proper on the test Dataset.

Mountchicken commented 2 years ago

@payal211 Sorry for the late reply. The training process seems stuck after the 15th epoch. And the strange thing is that the char precision is so high.

Is it possible that there are some characters in your dataset that are not in DICT90?
Could you please describe what's images in your dataset look like. The max decode sequence length is set to be 30 here, and your label length may exceed 30 and that can also cause such a phenomenon.
BTW, your training batch size is small, only 8. Try a larger one after we solve this problem.

payal211 commented 2 years ago

Hi @Mountchicken,

- Is it possible that there are some characters in your dataset that are not in DICT90? I am training on DICT36 as said earlier .\mmocr-main\mmocr\models\textrecog\convertors\base.py line 22 : DICT36 = tuple('0123456789abcdefghijklmnopqrstuvwxyz') to DICT36 = tuple('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')

Could you please describe what's images in your dataset look like. The max decode sequence length is set to be 30 here, and your label length may exceed 30 and that can also cause such a phenomenon. Sure, Can you please share your Email Id so I can share sample image with you. And decode sequence label length will increased based on character accuracy as it decode multiple characters with different probability for one character.

BTW, your training batch size is small, only 8. Try a larger one after we solve this problem. Yes, I changed it to 64, and it is taking 10 GB of RAM of GPU out of 24 GB. so sure I will try with larger one after solve this problem.

payal211 commented 2 years ago

here, is attached config for batch 64 and DICT36 classes, which I modified sar_r31_parallel_decoder_custom_dataset_batch64.txt base.txt

.

Mountchicken commented 2 years ago

@payal211 927922033@qq.com

Mountchicken commented 2 years ago

Hi @payal211 Your config file is totally fine. It seems that the image you sent to me is contextless. The random combination of numbers and characters can easily confuse SAR reviewed in image 1. Those pictures come from the paper RobustScanner. The accuracy table below is from an experiment that tests recognition algorithms on some random text which is contextless like yours. As you can see, the word accuracy of SAR is the worst. However, the character accuracy can still be high in the first picture.

This might be the reason. A practical way to check is to see the prediction on your test sets. You can try the following commands to visualize the prediction. If the prediction somehow shares a common with the first picture above, then that would be the problem of the algorithm itself.
```
python tools/recog_test_imgs.py  {PATH_TO_YOUR_TEST_IMAGES} {PATH_TO_YOUR_TXT_FORMAT_LABEL} configs/textrecog/sar/sar_r31_parallel_decoder_custom_dataset_batch64.py {PATH_TO_CHECKPOINTS}
```
BTW, your dataset seems to have a clean background. And I recommend you use CRNN instead. CRNN works pretty well on contextless datasets and training CRNN is also much faster with a less GPU cost.

payal211 commented 2 years ago

Hi @Mountchicken Really Appreciated. Thank you for all this details. So Definitely I will Try CRNN and check the accuracy.

payal211 commented 2 years ago

Hi @Mountchicken

I trained CRNN Model but after the second epoch, loss_ctc and loss both went infinite. Here you can find log file for your reference. 20220407_095658.log Can you please look into this.

Thank you

Mountchicken commented 2 years ago

Hi @payal211 You can replace loss=dict(type='CTCLoss') to loss=dict(type='CTCLoss', flatten=False, zero_infinity=True) in https://github.com/open-mmlab/mmocr/blob/main/configs/base/recog_models/crnn.py#L10. You can also take a look at this issue

payal211 commented 2 years ago

Hi @Mountchicken

I had tested Both model SAR and CRNN and I am not able to recognize the TEXT with Good accuracy.

Here I am sharing the accuracy log. Trained SAR model and checked ACCURACY at 183rd epoch, here is the attached log file for this. 20220411_043023.log

Trained CRNN Model and checked accuracy at 421th epoch and here is the attached log file for this 20220411_074324.log

As of now CRNN model recognize only digits with very less score. Can you please help me? should I have to stop training or continue?

Thank you

Mountchicken commented 2 years ago

Hi @payal211 I think we should stop training now.

I rechecked your log file and found that the number of repetitions in both your training and test sets is 100. Let's start by setting them to 1. ( This is an example to show you where to change train_repeat test_repeat) This can save you a lot of training time and may even be the problem.
I also rechecked the training set image you sent me and I found that there is a large amount of white border in the whole image and the text area only takes up a small piece. I presume this is a screenshot and not the actual dataset? If the dataset really looks like this, then you need to crop out the text area separately and feed it to the recognizer.

payal211 commented 2 years ago

@Mountchicken

Thanks. I will do needful. And yes, I sent you the raw data, I am cropping that particular portion containing text and feeding those cropped images into the training.

payal211 commented 2 years ago

Hi @Mountchicken,

I tried with your suggestion, but no luck.

After 1200th epoch the accuracy result is in attached log file 20220412_100903.log

and I continuing till 2400th epoch, and log file is here, 20220412_144955.log

Mountchicken commented 2 years ago

We are now in a bottleneck.

Is the log file above about training CRNN？Previously, your SAR was able to have about 80% accuracy, why is the accuracy of these training 0? Maybe we can try SAR again with repeat = 1.
I am also confused now. Do the test set images have the same style as the training set? For example, do the test set images have a black background and light text like the one you sent me before. If the test set is quite different from the training set, you may need to consider data augmentation. All I can think of right now is that you comment out the Normalize operation(NormalizeOCR in SAR) in the train pipeline and test pipeline first. Because your image style is not very suitable for normalization

payal211 commented 2 years ago

Yes, the above log file is about CRNN. Previously, SAR model just show the precision and recall above 80% but it wasn't work well on test data. here, is the attached log file for SAR Model 20220411_043023.log

so okay, I can give one more try again with SAR and repeat = 1 & without NormalizationOCR.

Yes, I have same style test data as the training set.

balandongiv commented 2 years ago

Hi @Mountchicken, I still unable to understand what the purpose of repetition and how it affect the training time.

I rechecked your log file and found that the number of repetitions in both your training and test sets is 100. Let's start by setting them to 1. ( This is an example to show you where to change train_repeat test_repeat) This can save you a lot of training time and may even be the problem.

The documentation define repeat as Repeated times of dataset.

Correct me if am wrong. Say for example, if we set repeat=100 for both training and test sets. Does it mean the dataset being train or evaluated for 100 times ?

Mountchicken commented 2 years ago

@balandongiv Yes. For example, if repeat is set to 10, then the number of training iterations will also be expanded by a factor of ten. However, the number of repeat for the test set should be 1.

balandongiv commented 2 years ago

Thanks for the confirmation @Mountchicken .

But, any particular reason to repeat the training on a train dataset for x repeat of times? Wont this cause over fitting to the training dateset. Also, any advice/recommendation for the maximum number for repeat. I notice, at least in the toy datesets, the value was assigned to 100.

gaotongxiao commented 2 years ago

Sometimes this feature is needed when we train a model on a set of datasets with imbalanced sizes, where tiling the small dataset several times is the most straightforward way to alleviate the bias brought by the large ones. SAR is an example.

balandongiv commented 2 years ago

Thanks for detail explanation @gaotongxiao

open-mmlab / mmocr

Not getting accuracy on MMRecognition SAR Model Training #858