Closed payal211 closed 2 years ago
Hi @payal211 , can you share your config file?
According to your log, it still needs 18 days to complete training. The accuracy is still very low and it probably has something to do with your hyperparameter configuration. FYI, we trained SAR on 48 GPUs, and you might scale down the learning rate accordingly. We have also provided the detailed log for reference https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json.
As for data augmentation techniques, ABINet's pipeline is empirically effective in boosting the model's final performance. But using them won't necessarily reduce the convergence time.
Hi @payal211 , can you share your config file?
Hi @Mountchicken
Here is attached config file, sar_r31_parallel_decoder_custom_dataset.txt
Thanks
According to your log, it still needs 18 days to complete training. The accuracy is still very low and it probably has something to do with your hyperparameter configuration. FYI, we trained SAR on 48 GPUs, and you might scale down the learning rate accordingly. We have also provided the detailed log for reference https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json.
As for data augmentation techniques, ABINet's pipeline is empirically effective in boosting the model's final performance. But using them won't necessarily reduce the convergence time.
Hi @gaotongxiao,
Thanks for the quick response and suggestion. Can you please more elaborate what should be the hyperparameters? I am training this model for 37 characters. (0-9 digits and A-Z alphabets) with 600 epochs. Is it right or should I have to train it for more time? If ABINET is good choice for this task then I can try that too. Thanks for your help.
Hi @payal211 You should try a larger batch size for faster training. Using 8 samples in a batch only occupy 2g gpu memory in your case. Try 32 or 64 if possible. BTW, can you show me the config file generated when training start? It contains full information, and it should be located at ./work_dirs or somewhere.
Hi @Mountchicken, Thanks for point out me. Here I am attaching the config file generated when training start. sar_r31_parallel_decoder_custom_dataset.txt
Hi @payal211
What does your data look like? BTW I see that you are using DICT90 and if you are predicting characters from 0~9 and A~Z
, remember to modify it.
And there is a hiding bug in SAR. It can't recognize the number 0 and if every test image has 0 in its label, the accuracy will always be 0.00%
Hi @Mountchicken Can you please correct me where I have to modify As per my Knowledge, I modified these 2 files:
.\mmocr-main\mmocr\models\textrecog\convertors\base.py line 22 : DICT36 = tuple('0123456789abcdefghijklmnopqrstuvwxyz') to DICT36 = tuple('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')
.\mmocr-main\configs_base_\recog_models\sar.py line 2 : type='AttnConvertor', dict_type='DICT90', with_unknown=True) to type='AttnConvertor', dict_type='DICT36', with_unknown=True)
and any suggestion how to overcome this hiding bug in SAR?
Thanks.
Sorry for the late reply. It's ok to modify the dictionary in this way. Here is the bug: When calculating CrossEntropy Loss
in SAR, we set the ignore_index to 0 as the default. The ignore_index
should point to the <PAD> token
in the dictionary for attention mechanism. However, the bug is that when we are building the dictionary, the <PAD> token
is not at the position of index 0
, instead, the end of the dictionary. check here. So the model has to correctly predict the <PAD> token
to minimize the CE loss which is unnecessary and may cause the model hard to converge in your situation.
Here is a method to quickly fix the bug by moving the <PAD> token
to the front of the dictionary:
modify this two line in this way:
#self.idx2char.append(padding_token)
self.idx2char.insert(0, padding_token)
#self.padding_idx = len(self.idx2char) - 1
self.padding_idx = 0
Thank you so much @Mountchicken. I will start Training and will update you for result.
Hi @Mountchicken & @gaotongxiao I started training on 22nd March 2022, after 15th epoch Accuracy is
[23-03-2022 09:33] mmocr - INFO - Epoch(val) [15][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9500, 0_1-N.E.D: 0.9679
Then after 21st Epoch Accuracy is: 2022-03-23 12:04:30,316 - mmocr - INFO - Epoch(val) [21][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9800, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679
Then After 45th Epoch Accuracy is: 2022-03-25 03:38:54,303 - mmocr - INFO - Epoch(val) [45][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9800, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9680
Then After 91th Epoch Accuracy is:
2022-03-28 06:06:50,483 - mmocr - INFO - Epoch(val) [91][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679
and after 99th epoch still there's no more difference in Accuracy
2022-03-28 16:55:05,712 - mmocr - INFO - Epoch(val) [99][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679
so is there Anything I am missing here? As precision and recall is pretty much good but still not able to recognize text proper on the test Dataset.
@payal211 Sorry for the late reply. The training process seems stuck after the 15th epoch. And the strange thing is that the char precision is so high.
DICT90
?Hi @Mountchicken,
- Is it possible that there are some characters in your dataset that are not in DICT90? I am training on DICT36 as said earlier .\mmocr-main\mmocr\models\textrecog\convertors\base.py line 22 : DICT36 = tuple('0123456789abcdefghijklmnopqrstuvwxyz') to DICT36 = tuple('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')
Could you please describe what's images in your dataset look like. The max decode sequence length is set to be 30 here, and your label length may exceed 30 and that can also cause such a phenomenon. Sure, Can you please share your Email Id so I can share sample image with you. And decode sequence label length will increased based on character accuracy as it decode multiple characters with different probability for one character.
BTW, your training batch size is small, only 8. Try a larger one after we solve this problem. Yes, I changed it to 64, and it is taking 10 GB of RAM of GPU out of 24 GB. so sure I will try with larger one after solve this problem.
here, is attached config for batch 64 and DICT36 classes, which I modified sar_r31_parallel_decoder_custom_dataset_batch64.txt base.txt
.
@payal211 927922033@qq.com
Hi @payal211 Your config file is totally fine. It seems that the image you sent to me is contextless. The random combination of numbers and characters can easily confuse SAR reviewed in image 1. Those pictures come from the paper RobustScanner. The accuracy table below is from an experiment that tests recognition algorithms on some random text which is contextless like yours. As you can see, the word accuracy of SAR is the worst. However, the character accuracy can still be high in the first picture.
python tools/recog_test_imgs.py {PATH_TO_YOUR_TEST_IMAGES} {PATH_TO_YOUR_TXT_FORMAT_LABEL} configs/textrecog/sar/sar_r31_parallel_decoder_custom_dataset_batch64.py {PATH_TO_CHECKPOINTS}
Hi @Mountchicken Really Appreciated. Thank you for all this details. So Definitely I will Try CRNN and check the accuracy.
Hi @Mountchicken
I trained CRNN Model but after the second epoch, loss_ctc and loss both went infinite. Here you can find log file for your reference. 20220407_095658.log Can you please look into this.
Thank you
Hi @payal211 You can replace loss=dict(type='CTCLoss') to loss=dict(type='CTCLoss', flatten=False, zero_infinity=True) in https://github.com/open-mmlab/mmocr/blob/main/configs/base/recog_models/crnn.py#L10. You can also take a look at this issue
Hi @Mountchicken
I had tested Both model SAR and CRNN and I am not able to recognize the TEXT with Good accuracy.
Here I am sharing the accuracy log. Trained SAR model and checked ACCURACY at 183rd epoch, here is the attached log file for this. 20220411_043023.log
Trained CRNN Model and checked accuracy at 421th epoch and here is the attached log file for this 20220411_074324.log
As of now CRNN model recognize only digits with very less score. Can you please help me? should I have to stop training or continue?
Thank you
Hi @payal211 I think we should stop training now.
@Mountchicken
Thanks. I will do needful. And yes, I sent you the raw data, I am cropping that particular portion containing text and feeding those cropped images into the training.
Hi @Mountchicken,
I tried with your suggestion, but no luck.
After 1200th epoch the accuracy result is in attached log file 20220412_100903.log
and I continuing till 2400th epoch, and log file is here, 20220412_144955.log
We are now in a bottleneck.
Yes, the above log file is about CRNN. Previously, SAR model just show the precision and recall above 80% but it wasn't work well on test data. here, is the attached log file for SAR Model 20220411_043023.log
so okay, I can give one more try again with SAR and repeat = 1 & without NormalizationOCR.
Yes, I have same style test data as the training set.
Hi @Mountchicken, I still unable to understand what the purpose of repetition
and how it affect the training time.
I rechecked your log file and found that the number of repetitions in both your training and test sets is 100. Let's start by setting them to 1. ( This is an example to show you where to change train_repeat test_repeat) This can save you a lot of training time and may even be the problem.
The documentation define repeat as Repeated times of dataset.
Correct me if am wrong. Say for example, if we set repeat=100
for both training and test sets. Does it mean the dataset being train or evaluated for 100 times ?
@balandongiv Yes. For example, if repeat is set to 10, then the number of training iterations will also be expanded by a factor of ten. However, the number of repeat for the test set should be 1.
Thanks for the confirmation @Mountchicken .
But, any particular reason to repeat
the training on a train dataset for x repeat
of times? Wont this cause over fitting to the training dateset. Also, any advice/recommendation for the maximum number for repeat
. I notice, at least in the toy datesets, the value was assigned to 100
.
Sometimes this feature is needed when we train a model on a set of datasets with imbalanced sizes, where tiling the small dataset several times is the most straightforward way to alleviate the bias brought by the large ones. SAR is an example.
Thanks for detail explanation @gaotongxiao
HI @gaotongxiao
I am training Custom dataset for Text recognition using SAR Model. I have total 7K plus images for training. Can you please help me how long should I have to wait for trained model. As of now it completed 65th epoch and the Accuracy matric at 65th epoch is as below:
2022-03-21 08:06:45,433 - mmocr - INFO - Epoch(val) [65][100] 0_word_acc: 0.0000, 0_word_acc_ignore_case: 0.0000, 0_word_acc_ignore_case_symbol: 0.0000, 0_char_recall: 0.1346, 0_char_precision: 0.1089, 0_1-N.E.D: 0.0776
As you can see precision and recall are very less.
Also can you please suggest any preprocessing technique which you are aware to achieve good accuracy with respect to text recognition task?
Here is the attached SS of training continuation: