lixiangpengcs commented 5 years ago

I am new to OCR. I run your code and the Loss and Recognition Loss are all negative and detection loss is positive. I am a little confused about this result. Is it correct?

Train Epoch: 3 [576/856 (67%)] Loss: -4.654196 Detection Loss: 0.016792 Recognition Loss:-4.670988 Train Epoch: 3 [592/856 (69%)] Loss: -3.796435 Detection Loss: 0.017061 Recognition Loss:-3.813496 Train Epoch: 3 [608/856 (71%)] Loss: -4.000570 Detection Loss: 0.014988 Recognition Loss:-4.015558

Besides, the experiment results of first few epoch are 0. It is really strage:

['img_100.jpg', 'img_213.jpg', 'img_624.jpg', 'img_362.jpg', 'img_491.jpg', 'img_469.jpg'] Expected tensor to have CPU Backend, but got tensor with CUDA Backend (while checking arguments for cudnn_ctc_loss) epoch : 3 loss : -3.987356729596575 det_loss : 0.021637324957507795 rec_loss : -4.008994055685596 precious : 0.0 recall : 0.0 hmean : 0.0 val_loss : 0.0 val_det_loss : 0.0 val_rec_loss : 0.0 val_precious : 0.0 val_recall : 0.0 val_hmean : 0.0 Saving checkpoint: ./saved_model/united_2019-09-12/checkpoint-epoch003-loss--3.9874.pth.tar ...

novioleo commented 5 years ago

17 the ctc loss comes to negative is common,you can search it on google.

^.^ As above your error info mentioned,you need covert your convert your tensor to cpu,the tensor between gt label and predict are inconsist. @lixiangpengcs

lixiangpengcs commented 5 years ago

@novioleo, Thanks for your quick reply. I have solved ctc-loss problem by using torch version 1.0.1.post2. However, the experiment results also seems strange. Val_precision and val_recall are still 0 after training 10 epoches. Is it a normal phenohenon? Also the loss is stable around -4.0, Is it correct?

Start validate epoch : 8 loss : -3.9846485162449774 det_loss : 0.01933059994583932 rec_loss : -4.003979111386236 precious : 0.0 recall : 0.0 hmean : 0.0 val_loss : 184.1101213319129 val_det_loss : 89.47653757918037 val_rec_loss : 94.63358375273253 val_precious : 0.0 val_recall : 0.0 val_hmean : 0.0 Saving checkpoint: ./saved_model/united_2019-09-12/checkpoint-epoch008-loss--3.9846.pth.tar ... Train Epoch: 9 [0/856 (0%)] Loss: -4.143412 Detection Loss: 0.016749 Recognition Loss:-4.160161 Train Epoch: 9 [16/856 (2%)] Loss: -3.958525 Detection Loss: 0.016582 Recognition Loss:-3.975107 Train Epoch: 9 [32/856 (4%)] Loss: -4.779177 Detection Loss: 0.013592 Recognition Loss:-4.792768 Train Epoch: 9 [48/856 (6%)] Loss: -4.109888 Detection Loss: 0.015186 Recognition Loss:-4.125073 Train Epoch: 9 [64/856 (7%)] Loss: -4.520103 Detection Loss: 0.013715 Recognition Loss:-4.533818 Train Epoch: 9 [80/856 (9%)] Loss: -3.883961 Detection Loss: 0.015919 Recognition Loss:-3.899881

novioleo commented 5 years ago

not correct...

---Original--- From: "Xiangpeng Li"<notifications@github.com> Date: 2019/9/12 17:08:06 To: "novioleo/FOTS"<FOTS@noreply.github.com>; Cc: "Comment"<comment@noreply.github.com>;"Tao Luo"<744351893@qq.com>; Subject: Re: [novioleo/FOTS] Confiusion about total loss is negative (#18)

Thanks for your quick reply. I have solved ctc-loss problem by using torch version 1.0.1.post2. However, the experiment results also seems strange. Val_precision and val_recall are still 0 after training 10 epoches. Is it a normal phenohenon? Also the loss is stable around -4.0, Is it correct?

Start validate epoch : 8 loss : -3.9846485162449774 det_loss : 0.01933059994583932 rec_loss : -4.003979111386236 precious : 0.0 recall : 0.0 hmean : 0.0 val_loss : 184.1101213319129 val_det_loss : 89.47653757918037 val_rec_loss : 94.63358375273253 val_precious : 0.0 val_recall : 0.0 val_hmean : 0.0 Saving checkpoint: ./saved_model/united_2019-09-12/checkpoint-epoch008-loss--3.9846.pth.tar ... Train Epoch: 9 [0/856 (0%)] Loss: -4.143412 Detection Loss: 0.016749 Recognition Loss:-4.160161 Train Epoch: 9 [16/856 (2%)] Loss: -3.958525 Detection Loss: 0.016582 Recognition Loss:-3.975107 Train Epoch: 9 [32/856 (4%)] Loss: -4.779177 Detection Loss: 0.013592 Recognition Loss:-4.792768 Train Epoch: 9 [48/856 (6%)] Loss: -4.109888 Detection Loss: 0.015186 Recognition Loss:-4.125073 Train Epoch: 9 [64/856 (7%)] Loss: -4.520103 Detection Loss: 0.013715 Recognition Loss:-4.533818 Train Epoch: 9 [80/856 (9%)] Loss: -3.883961 Detection Loss: 0.015919 Recognition Loss:-3.899881

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

lixiangpengcs commented 5 years ago

Can you tell me why my code runing like this? Or can you tell me what correct training log should be?

novioleo commented 5 years ago

@lixiangpengcs replace the original ctc_loss with torch-baidu-ctc. because of the bug of ctc loss....

novioleo commented 5 years ago

if the error still comes out ,i think you need checkout your gt

lixiangpengcs commented 5 years ago

I modify the common_str character set. The code is still wired even the training recision and recall is not 0: Start validate epoch : 38 loss : 3.4252535650663285 det_loss : 0.013250550838344009 rec_loss : 3.4120030135751884 precious : 0.014009486778199243 recall : 0.014009486778199243 hmean : 0.014009486778199243 val_loss : 14.513418368259934 val_det_loss : 5.277081434320855 val_rec_loss : 9.23633693393908 val_precious : 0.0 val_recall : 0.0 val_hmean : 0.0

novioleo commented 5 years ago

@lixiangpengcs how many pcs of your dataset?

lixiangpengcs commented 5 years ago

@lixiangpengcs how many pcs of your dataset?

I use icdar2015 as my training set.

lixiangpengcs commented 5 years ago

Detection performance looks not bad. May be the error appears in the recognition branch.

novioleo commented 5 years ago

i modified the recognition part code.you can modify for your application.

------------------ 原始邮件 ------------------ 发件人: "Xiangpeng Li"notifications@github.com; 发送时间: 2019年9月18日(星期三) 晚上8:30 收件人: "novioleo/FOTS"FOTS@noreply.github.com; 抄送: "北国枫叶。"744351893@qq.com; "Mention"mention@noreply.github.com; 主题: Re: [novioleo/FOTS] Confiusion about total loss is negative (#18)

Detection performance looks not bad. May be the error appears in the recognition branch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

novioleo commented 5 years ago

icdar2015 usually cause some problems,you can try your own dataset

---Original--- From: "Xiangpeng Li"<notifications@github.com> Date: 2019/9/18 14:28:08 To: "novioleo/FOTS"<FOTS@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"Tao Luo"<744351893@qq.com>; Subject: Re: [novioleo/FOTS] Confiusion about total loss is negative (#18)

@lixiangpengcs how many pcs of your dataset?

I use icdar2015 as my training set.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.