ku21fan / STR-Fewer-Labels

Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)
MIT License
173 stars 27 forks source link

Result in evaluation #9

Closed lyminhuit closed 1 year ago

lyminhuit commented 1 year ago

I am training models in Vietnamese. I modified the character to match Vietnamese and train with the following command: !CUDA_VISIBLE_DEVICES=0 python train.py \ --select_data / \ --model_name TRBA \ --exp_name CRNN_aug \ --Aug Blur5-Crop99 \ --train_data train \ --valid_data val \ --character 'aàảãáạăằẳẵắặâầẩẫấậbcdđeèẻẽéẹêềểễếệfghiìỉĩíịjklmnoòỏõóọôồổỗốộơờởỡớợpqrstuùủũúụưừửữứựvwxyỳỷỹýỵz0123456789' \ --batch_ratio 1

After 2000 iters, the model evaluates, I wonder why the model predicts [UNK] as shown below. So is my training correct?

image

ku21fan commented 1 year ago

Hi,

I guess you should comment out (or remove) the below lines in test.py

https://github.com/ku21fan/STR-Fewer-Labels/blob/9ca8c5e6707e5ed1300babd5607b1741a916ddda/test.py#L207-L213

Our repository is assumed to be used for the English benchmark, and the above lines are used for the English benchmark. If you use our repository for other languages, you should comment out them.

Hope it helps Best

lyminhuit commented 1 year ago

Thanks for your reply! I have removed these lines of test.py but [UNK] is still appear!

ku21fan commented 1 year ago

Umm.. Does your data only consist of your character set? aàảãáạăằẳẵắặâầẩẫấậbcdđeèẻẽéẹêềểễếệfghiìỉĩíịjklmnoòỏõóọôồổỗốộơờởỡớợpqrstuùủũúụưừửữứựvwxyỳỷỹýỵz0123456789

If your data contain an unknown character that is not included in the character set, the unknown character is converted into [UNK] by below lines. https://github.com/ku21fan/STR-Fewer-Labels/blob/e6aa817e2eacbf29b3fcd11390d78b1a8f96bf78/utils.py#L125-L128

Can you check if your data contain unknown characters or not?

lyminhuit commented 1 year ago

Thanks you sir, it's working