ying09 / TextFuseNet

A PyTorch implementation of "TextFuseNet: Scene Text Detection with Richer Fused Features".
MIT License
475 stars 122 forks source link

Cannot reproduce training result: ICDAR2015 #90

Closed JooYoungJang closed 2 years ago

JooYoungJang commented 2 years ago

Hi, First of all, thank you for a great work!

I tried to train from pretrained synthetic_weight and fine-tuned in ICDAR2015 to reproduce the evaluation result.

However, after training exactly same as the paper said, I got following result:

The evaluation protocal was from https://rrc.cvc.uab.es/?ch=4

For training, I did the list bellow

  1. Generate synthetic data for weakly-supervised learning. (Character-wise pseudo label was driven by model_syn_r101_pretrain.pth)
  1. Convert gt text to coco-json format
  2. Trained with hyperparameters below
    • batch_size : 8
    • base learning rate : 0.005 (divided by 10 after 10K iterations)
    • step : 20K
    • optim: SGD w/ weight decay 0.0001, momentum 0.9

I also attach config file I used: BASE: "./Base-RCNN-FPN.yaml" MODEL: MASK_ON: True TEXTFUSENET_MUTIL_PATH_FUSE_ON: True EXP_NAME: icdar2015_101_FPN_lr0.005_cls64_vsPaper WEIGHTS: "/workspace/TextFuseNet_original/weights/model_final.pth" PIXEL_STD: [57.375, 57.120, 58.395] RESNETS: STRIDE_IN_1X1: False # this is a C2 model NUM_GROUPS: 32 WIDTH_PER_GROUP: 8 DEPTH: 101 ROI_HEADS: NMS_THRESH_TEST: 0.35 TEXTFUSENET_SEG_HEAD: FPN_FEATURES_FUSED_LEVEL: 2 POOLER_SCALES: (0.0625,)

DATASETS: TRAIN: ("icdar2015_train",) TEST: ("icdar2015_val",) SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.005 STEPS: (10000,) MAX_ITER: 20000 CHECKPOINT_PERIOD: 1000

INPUT: MIN_SIZE_TRAIN: (800,1000,1200) MAX_SIZE_TRAIN: 1500 MIN_SIZE_TEST: 1000 MAX_SIZE_TEST: 3000 TEST: GT: "/workspace/script_test_ch4_t1_e1-1577983151/gt.zip"

OUTPUT_DIR: "/workspace/TextFuseNet_original/out_dir_r101/icdar2015_paper/"

I am not sure where I am confused. Can anyone give me an advice?

Thanks in advance,

LIYHUI commented 2 years ago

Hi, have you reproduced the results?

LIYHUI commented 2 years ago

@JooYoungJang I can't reproduce it too

ying09 commented 2 years ago

@JooYoungJang The config files for training please refer to https://github.com/ying09/TextFuseNet/tree/master/configs/ocr Our all config files have been upload. Your reproduce result of hmean is 0.5932896890343698. There is definitely something wrong with this. Even if we use the original maskrcnn, its performance will be much better than 0.5932896890343698.

LIYHUI commented 2 years ago

@ying09 hi, thanks for your reply, my reproduced result of hmean is 0.893 vs 0.922(original). On the one hand, I think this may be due to the difference in the gt generation code. On the other hand, I think there is something wrong with the pretrained model as metioned in https://github.com/ying09/TextFuseNet/issues/100.