roatienza / deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Apache License 2.0
284 stars 57 forks source link

About the speed of the model in Table 4 of the paper #37

Closed lexiaoyuan closed 1 year ago

lexiaoyuan commented 1 year ago

Hello, thank you very much for being able to open the source code, it is a very rewarding work. How are the speeds of the different models in Table 4 calculated? When I benchmark the vitstr-tiny model using the weights vitstr_tiny_patch16_224.pth provided in the repository, the output averaged_infer_time: 0.116, which is quite different from the speed in the paper: 9.3 msec/image, so I would like to know how I should calculate the speed of the model accurately, and I look forward to your help, thank you very much! (ps: my code is running on NVIDIA GeForce RTX 2080 Ti)

roatienza commented 1 year ago

Hi, When performing the benchmark, pls warm the GPU 1st. Warm up = run 100 inferences. Dont use the timing during warmup. After that, perform the benchmark by running 100 inferences each with batch size of 1. Speed is benchmark total_time / 100.

lexiaoyuan commented 1 year ago

Thank you very much for your help. Did you get their speed by re-implementing the other models in Table 4 (e.g. CRNN, GCRNN, etc.)?

roatienza commented 1 year ago

Yes. I have to for fair comparison.

lexiaoyuan commented 1 year ago

Ok, thank you for your answer and help!