CRNN自定义数据集存在与数据绑定的损失上溢 loss:65504 - Githubissues

mindspore-lab / mindocr

A toolbox of ocr models and algorithms based on MindSpore

https://mindspore-lab.github.io/mindocr/

Apache License 2.0

183 stars 46 forks source link

CRNN自定义数据集存在与数据绑定的损失上溢 loss:65504 #610

Closed panxua closed 6 days ago

panxua commented 8 months ago

现象： 存在和数据绑定的损失函数上溢 截图： 损失上溢1115 现状： 已解决 原因：

对于“标注长度 > max_text_len”，数据处理会置空而没有提示
对于“标注长度 + 重复标识符 > pred_seq_len”，会导致CTCLoss上溢，无提示。

详细说明：地址 解决方法： 统计标注最大长度，配置seq_max_len；统计标注+重复标识符最大长度，配置pred_seq_len 并分别修改训练、评估、预测中的img_shape中的宽度，满足4 x pred_seq_len 建议： 在raining_recognition_custom_dataset中提示用户， https://github.com/mindspore-lab/mindocr/blob/main/docs/en/tutorials/training_recognition_custom_dataset.md https://github.com/mindspore-lab/mindocr/blob/main/docs/cn/tutorials/training_recognition_custom_dataset.md

zhtmike commented 8 months ago

Hello, we provide two additional options to solve the problem you mentioned. For reason 1, you can add filter_max_len: True in your configure file to filter these problematic cases; And you can add filter_max_len: True and extra_count_if_repeat: True to filter these cases raised from reason 2. For detail, you can check configs/rec/svtr/svtr_tiny.yaml. :)