Best practice for training text detection model

balandongiv commented 2 years ago

Assume the batch of images is of the following pattern. Specifically, all images contain context-less word,and each image contain 5 level (rows) of character as shown below. Each image however is differentiate by different scale (zoom in,zoom out) and contrast.

mmocr_sample

Given that the text orientation is around 0 degree, my naive understanding believe that this can easily be implement using any off the shelf det model that available in mmocr. Since the pretrain model unable to detect some of the character, I then try to train the dbnetpp from scratch using the a custom dataset, and validate the train model using the training dataset (purposely overfit). The training image is about ~1500 files and the repetition is set to 3. Unfortunately, the validation score such as heman-iou:recall, precision, hmean are all zero even up to 600 epochs. This despite the loss showed downward pattern from 5.25053 to ~ 0.8 at for 0 and 600++ epochs.

One of the idea is to tweak the epsilon as suggested in this OP which is to accommodate for different scenarios. However, this custom dataset contain only short text, I believe the default epsilon at least should give atleast hmean > 0.1.

Hence my question is, should I change the epsilon value?, and it this standard practice even for a short text? Also, should I train together with other public dataset but with the expense of longer training time :nauseated_face: :nauseated_face: :nauseated_face:

Appreciate for any advice.

Mountchicken commented 2 years ago

Hi @balandongiv If possible, you can draw targets on the original image and see if it is normal.

balandongiv commented 2 years ago

Hi @Mountchicken , may I know what do mean by draw a targets on the original image? Appreciate if you can clarify further.

Mountchicken commented 2 years ago

@balandongiv Sorry for my misunderstanding, you can draw gt boxes first to see if the annotation is correct. It's abnormal to get zero accuracy after 600 epochs. It may be the problem of incorrect annotations.

balandongiv commented 2 years ago

Hi @Mountchicken

I am using labelme to annotate these images. labelme allow user to view the bbox that has been defined. In essence, this may rule out the possibility of incorrect annotations. Additionally, the crop output from labelme_converter.py indicate a correct region is segmented.

IIUC, by

draw gt boxes first to see if the annotation is correct

you mean do some cropping exercise,right?

gaotongxiao commented 2 years ago

Sometimes visualizing your model's output could be helpful

balandongiv commented 2 years ago

Hi @gaotongxiao ,thanks for the suggestion. But, I am not so clear about what you are suggestion here. By model output, do you mean the ROI (identified area of interest) or the loss output?

gaotongxiao commented 2 years ago

@balandongiv You can use your models whose performance seemed to be statistically low to perform inference on some pictures, and check if its visualized outputs (i.e. polygons) make some sense. The --show-dir argument of test.py is helpful for your case

balandongiv commented 2 years ago

Thanks for the test.py, it is indeed really helpful @gaotongxiao . As I mentioned above, I purposely using the same training dataset for validation. Unfortunately, even using model of the 1200 epoch still produce tremendously bad result

Evaluating hmean-iou...
thr 0.30, recall: 0.000, precision: 0.000, hmean: 0.000
thr 0.40, recall: 0.000, precision: 0.000, hmean: 0.000
thr 0.50, recall: 0.000, precision: 0.000, hmean: 0.000
thr 0.60, recall: 0.000, precision: 0.000, hmean: 0.000
thr 0.70, recall: 0.000, precision: 0.000, hmean: 0.000
thr 0.80, recall: 0.000, precision: 0.000, hmean: 0.000
thr 0.90, recall: 0.000, precision: 0.000, hmean: 0.000
{'0_hmean-iou:recall': 0.0, '0_hmean-iou:precision': 0.0, '0_hmean-iou:hmean': 0.0, '1_hmean-iou:recall': 0.0, '1_hmean-iou:precision': 0.0, '1_hmean-iou:hmean': 0.0, 'mean_hmean-iou:recall': 0.0, 'mean_hmean-iou:precision': 0.0, 'mean_hmean-iou:hmean': 0.0}

Surprisingly, the pre-trained dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth produced significantly good result

Evaluating hmean-iou...
thr 0.30, recall: 0.945, precision: 0.941, hmean: 0.943
thr 0.40, recall: 0.945, precision: 0.957, hmean: 0.951
thr 0.50, recall: 0.944, precision: 0.975, hmean: 0.959
thr 0.60, recall: 0.932, precision: 0.981, hmean: 0.956
thr 0.70, recall: 0.907, precision: 0.986, hmean: 0.945
thr 0.80, recall: 0.838, precision: 0.994, hmean: 0.909
thr 0.90, recall: 0.123, precision: 1.000, hmean: 0.218
{'0_hmean-iou:recall': 0.8484848484848485, '0_hmean-iou:precision': 0.9790209790209791, '0_hmean-iou:hmean': 0.9090909090909092, '1_hmean-iou:recall': 0.9439696106362773, '1_hmean-iou:precision': 0.9754661432777233, '1_hmean-iou:hmean': 0.9594594594594595, 'mean_hmean-iou:recall': 0.8962272295605629, 'mean_hmean-iou:precision': 0.9772435611493512, 'mean_hmean-iou:hmean': 0.9342751842751844}

balandongiv commented 2 years ago

Hi @gaotongxiao and @Mountchicken , unfortunately, this issue also happen when using the mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth as starting pth and using the toy dataset for validataion and training dataset. I notice, the recall, precision and hmean were decreasing with increasing epoch

Appreciate if you can confirm whether this issue is unique to my computer or a bug instead?

The cfg is as below:


_base_ = [
    '../configs/_base_/default_runtime.py',
    '../configs/_base_/det_models/ocr_mask_rcnn_r50_fpn_ohem_poly.py',
    '../configs/_base_/schedules/schedule_sgd_160e.py',
    '../configs/_base_/det_pipelines/maskrcnn_pipeline.py',
    '../configs/_base_/det_datasets/toy_data.py'

]

work_dir='/home/train_detect/maskrcnn_toy'
train_list = {{_base_.train_list}}
test_list = {{_base_.test_list}}

train_pipeline = {{_base_.train_pipeline}}
test_pipeline_ctw1500 = {{_base_.test_pipeline_ctw1500}}

data = dict(
    samples_per_gpu=8,
    workers_per_gpu=4,
    val_dataloader=dict(samples_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=1),
    train=dict(
        type='UniformConcatDataset',
        datasets=train_list,
        pipeline=train_pipeline),
    val=dict(
        type='UniformConcatDataset',
        datasets=test_list,
        pipeline=test_pipeline_ctw1500),
    test=dict(
        type='UniformConcatDataset',
        datasets=test_list,
        pipeline=test_pipeline_ctw1500))

evaluation = dict(interval=10, metric='hmean-iou')

and the result

Config:
log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'home/train_detect/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth'
resume_from = None
'''
''
''
after_run:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2022-07-20 08:03:42,089 - mmocr - INFO - workflow: [('train', 1)], max: 1200 epochs
2022-07-20 08:03:42,089 - mmocr - INFO - Checkpoints will be saved to /home/train_detect/maskrcnn_toy by HardDiskBackend.
2022-07-20 08:03:51,177 - mmocr - INFO - Saving checkpoint at 2 epochs
2022-07-20 08:04:00,185 - mmocr - INFO - Saving checkpoint at 4 epochs
2022-07-20 08:04:09,402 - mmocr - INFO - Saving checkpoint at 6 epochs
2022-07-20 08:04:18,695 - mmocr - INFO - Saving checkpoint at 8 epochs
2022-07-20 08:04:28,055 - mmocr - INFO - Saving checkpoint at 10 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 8.4 task/s, elapsed: 1s, ETA:     0s2022-07-20 08:04:29,941 - mmocr - INFO - 
Evaluating tests/data/toy_dataset/instances_test.txt with 10 images now
2022-07-20 08:04:29,942 - mmocr - INFO - Evaluating hmean-iou...
2022-07-20 08:04:29,945 - mmocr - INFO - thr 0.30, recall: 0.048, precision: 1.000, hmean: 0.091
2022-07-20 08:04:29,947 - mmocr - INFO - thr 0.40, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:04:29,948 - mmocr - INFO - thr 0.50, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:04:29,950 - mmocr - INFO - thr 0.60, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:04:29,951 - mmocr - INFO - thr 0.70, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:04:29,953 - mmocr - INFO - thr 0.80, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:04:29,954 - mmocr - INFO - thr 0.90, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:04:29,954 - mmocr - INFO - Epoch(val) [10][10]    0_hmean-iou:recall: 0.0476, 0_hmean-iou:precision: 1.0000, 0_hmean-iou:hmean: 0.0909
2022-07-20 08:04:38,393 - mmocr - INFO - Saving checkpoint at 12 epochs
2022-07-20 08:04:47,974 - mmocr - INFO - Saving checkpoint at 14 epochs
2022-07-20 08:04:57,952 - mmocr - INFO - Saving checkpoint at 16 epochs
2022-07-20 08:05:07,720 - mmocr - INFO - Saving checkpoint at 18 epochs
2022-07-20 08:05:17,315 - mmocr - INFO - Saving checkpoint at 20 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 8.3 task/s, elapsed: 1s, ETA:     0s2022-07-20 08:05:19,206 - mmocr - INFO - 
Evaluating tests/data/toy_dataset/instances_test.txt with 10 images now
2022-07-20 08:05:19,206 - mmocr - INFO - Evaluating hmean-iou...
2022-07-20 08:05:19,208 - mmocr - INFO - thr 0.30, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:05:19,209 - mmocr - INFO - thr 0.40, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:05:19,211 - mmocr - INFO - thr 0.50, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:05:19,212 - mmocr - INFO - thr 0.60, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:05:19,214 - mmocr - INFO - thr 0.70, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:05:19,215 - mmocr - INFO - thr 0.80, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:05:19,216 - mmocr - INFO - thr 0.90, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:05:19,217 - mmocr - INFO - Epoch(val) [20][10]    0_hmean-iou:recall: 0.0000, 0_hmean-iou:precision: 0.0000, 0_hmean-iou:hmean: 0.0000
2022-07-20 08:05:28,166 - mmocr - INFO - Saving checkpoint at 22 epochs
2022-07-20 08:05:37,913 - mmocr - INFO - Saving checkpoint at 24 epochs
2022-07-20 08:05:47,650 - mmocr - INFO - Saving checkpoint at 26 epochs
2022-07-20 08:05:57,252 - mmocr - INFO - Saving checkpoint at 28 epochs
2022-07-20 08:06:06,524 - mmocr - INFO - Saving checkpoint at 30 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 8.8 task/s, elapsed: 1s, ETA:     0s2022-07-20 08:06:08,358 - mmocr - INFO - 
Evaluating tests/data/toy_dataset/instances_test.txt with 10 images now
2022-07-20 08:06:08,359 - mmocr - INFO - Evaluating hmean-iou...
2022-07-20 08:06:08,361 - mmocr - INFO - thr 0.30, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:06:08,362 - mmocr - INFO - thr 0.40, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:06:08,364 - mmocr - INFO - thr 0.50, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:06:08,366 - mmocr - INFO - thr 0.60, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:06:08,367 - mmocr - INFO - thr 0.70, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:06:08,369 - mmocr - INFO - thr 0.80, recall: 0.000, precision: 0.000, hmean: 0.000
2022-07-20 08:06:08,370 - mmocr - INFO - thr 0.90, recall: 0.000, precision: 0.000, hmean: 0.000

gaotongxiao commented 2 years ago

@balandongiv I did a quick test and found the loss hovered around 4. I believe that the toy dataset is too small (only 10 pics) for detection models to train from scratch to even overfit. Detection models generally are very sensitive to hyperparameters. Therefore, the configuration that works for one specific dataset usually does not fit well for another.

Anyway, that was a nice catch. We'll figure out a better toy dataset for a quick run.

balandongiv commented 2 years ago

Thanks for the confirmation and detail explaination @gaotongxiao .

May I know what happen if you use the pretrain 'pth' as a starting point for further training with the toydataset as training and validation dataset. Does the result decreasing up to a point it become zero?

I wonder why this happen in my local? This occur despite I am using fresh environment.

Indeed, training the detection compared to recognition model is the hardest part in my pipeline.

balandongiv commented 2 years ago

Hi @gaotongxiao , regarding my previous post, any update if the issue reproduceable at your local machine, if resume from availabe pth?

Appreciate for the time 😃

gaotongxiao commented 2 years ago

The issue does exist, since the provided hyperparameters of DBNet is not applicable to toy dataset.

balandongiv commented 2 years ago

Thanks for the update @gaotongxiao , appreciate it

GSusan commented 2 years ago

Assume the batch of images is of the following pattern. Specifically, all images contain context-less word,and each image contain 5 level (rows) of character as shown below. Each image however is differentiate by different scale (zoom in,zoom out) and contrast.

Given that the text orientation is around 0 degree, my naive understanding believe that this can easily be implement using any off the shelf det model that available in mmocr. Since the pretrain model unable to detect some of the character, I then try to train the dbnetpp from scratch using the a custom dataset, and validate the train model using the training dataset (purposely overfit). The training image is about ~1500 files and the repetition is set to 3. Unfortunately, the validation score such as heman-iou:recall, precision, hmean are all zero even up to 600 epochs. This despite the loss showed downward pattern from 5.25053 to ~ 0.8 at for 0 and 600++ epochs.

One of the idea is to tweak the epsilon as suggested in this OP which is to accommodate for different scenarios. However, this custom dataset contain only short text, I believe the default epsilon at least should give atleast hmean > 0.1.

Hence my question is, should I change the epsilon value?, and it this standard practice even for a short text? Also, should I train together with other public dataset but with the expense of longer training time 🤢 🤢 🤢

Appreciate for any advice.

Hi, I'm new here. But I'm wondering where to set "repetition". Thanks.

open-mmlab / mmocr

Best practice for training text detection model #1154