roatienza / deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Apache License 2.0
284 stars 57 forks source link

about ACC #9

Open YuNaruto opened 2 years ago

YuNaruto commented 2 years ago

hi, i try run test.py with this: CUDA_VISION_DEVICES=0 python test.py --eval_data ../../data/data_lmdb_release/evaluation/ --benchmark_all_eval --Transformation None --FeatureExtraction None --SequenceModeling None --Prediction None --Transformer --sensitive --data_filtering_off --imgH 224 --imgW 224 --workers 0 --TransformerModel=vitstr_small_patch16_224 --saved_model ./pre_model/vitstr_small_patch16_224_aug.pth

i got the result is: accuracy: IIIT5k_3000: 86.233 SVT: 87.172 IC03_860: 94.186 IC03_867: 93.887 IC13_857: 92.415 IC13_1015: 91.527 IC15_1811: 78.078 IC15_2077: 71.931 SVTP: 81.550 CUTE80: 77.083 total_accuracy: 84.130 averaged_infer_time: 0.410 # parameters: 21.506

A little different from what you showed on Github, is this your best model?

roatienza commented 2 years ago

The results are correct. It may be a little different since the ones reported on the table are mean values. The best ViTSTR model is https://github.com/roatienza/deep-text-recognition-benchmark/releases/download/v0.1.0/vitstr_base_patch16_224_aug.pth.

YuNaruto commented 2 years ago

Thank you for your reply. I am looking forward to your outstanding contribution.