zhang0jhon / AttentionOCR

Scene text recognition
834 stars 259 forks source link

博主好,关于训练过程中遇到的一些问题,请指点一下,多谢啦! #71

Open zjz5250 opened 4 years ago

zjz5250 commented 4 years ago

@zhang0jhon 博主您好,首先特别感谢您做的工作,您开源的模型,效果确实很好。 我想尝试复现一下训练流程,但遇到如下3个问题: 1)速度特别慢,我只用了LSVT的数据,一个epoch都要大约6个小时 2)我尝试用多卡训练,但与单卡速度相当,我用的2080ti的卡 3)我测试了30个epoch后的效果,识别精度很差 想请教下: 1)模型训练,需要多少个epoch才合适,初始lr,还有batchsize的大小 2)您在多卡下也是这么慢吗?有没有提升训练速度的方法 3)lsvt中弱标注的数据怎么使用呢,没有文字区域的坐标,如何做mask处理 多谢啦!!

etatbak commented 4 years ago

Did you select your gpu at config file? It should be around 10-15 mins per epoch. @zjz5250

zjz5250 commented 4 years ago

@ etatbak Thank you! yes,i use gpu,set gpus = [0] in config.py. and how many steps of one epoch when you train the model. I found that it cost a lot of time when read images in every step. I set batchsize as 16, and it need about 2 seconds when read 16 images.

zjz5250 commented 4 years ago

@etatbak do you change the steps_per_epoch's value? the default value is 1500, but actually it should be a big number。 for example,if the total number of the training set is 16000,bachsize is 16,then the steps_per_epoch should be 1000,am I right?

zjz5250 commented 4 years ago

I use the lstv dataset,the total number is 238790,I set batchsize as 16,so the steps_per_epoch is 14924. when I train the model, I found that one epoch need about 6 hours。what is worse,after 11 epoches,the model can not work at all。

etatbak commented 4 years ago

@zjz5250 I didn't change many parameters. But I only used rects dataset, so I think if I use lstv it will also take longer. Step_per_epoch is 500 I think so. My batch_size is 10. I trained 1000 epochs but it doesn't work well, even not at average, I am not sure how to improve the performance.

zjz5250 commented 4 years ago

@ etatbak
did you use bp file transform from your new model,when you test the accuracy?

“You must feed a value for placeholder tensor 'label' with dtype int32 and shape [?,33]” did you meet this problem? and how to fix it

JianYang93 commented 4 years ago

@zjz5250 @etatbak @zhang0jhon Hi, I used all ReCTS, ArT, LSVT and IC2017MLT data and trained for 5 epochs on a single GPU (takes a day). I got training loss around 2 but very high validation loss. Do you have any idea on this?

JianYang93 commented 4 years ago

@zhang0jhon Could you please share what level of training and validation loss did you get with the final model? Thanks!

ustczhouyu commented 4 years ago

@zjz5250 您好,我训练的时候报错,没有icdar_datasets.npy,您方便把这个文件发到我的邮箱 zhou19920226@126.com给我吗,感激不尽.

ustczhouyu commented 4 years ago

@zhang0jhon Hello, thank you for sharing the codes. I fail to train the model, can you send me the icdar_datasets.npy to my email: zhou19920226@126.com ? Thank you very much.

JianYang93 commented 4 years ago

@ustczhouyu Hi, you will need to run dataset.py first to generate the npy file

JianYang93 commented 4 years ago

I got a validation loss around 1.3. The model can recognize some part of the text but the overall accuracy is relatively poor. I checked the pretrained recognition model has a loss around 0.5 so that should be the goal.

whereitogo commented 3 years ago

@zhang0jhon 博主您好,首先特别感谢您做的工作,您开源的模型,效果确实很好。 我想尝试复现一下训练流程,但遇到如下3个问题: 1)速度特别慢,我只用了LSVT的数据,一个epoch都要大约6个小时 2)我尝试用多卡训练,但与单卡速度相当,我用的2080ti的卡 3)我测试了30个epoch后的效果,识别精度很差 想请教下: 1)模型训练,需要多少个epoch才合适,初始lr,还有batchsize的大小 2)您在多卡下也是这么慢吗?有没有提升训练速度的方法 3)lsvt中弱标注的数据怎么使用呢,没有文字区域的坐标,如何做mask处理 多谢啦!!

我觉得应该改变读取数据的方式,我看作者的数据读取方式是将整个图像load,这太慢了,我准备改一下改成load裁剪之后的图像

xianzhe-741 commented 3 years ago

@zhang0jhon 博主您好,首先特别感谢您做的工作,您开源的模型,效果确实很好。 我想尝试复现一下训练流程,但遇到如下3个问题: 1)速度特别慢,我只用了LSVT的数据,一个epoch都要大约6个小时 2)我尝试用多卡训练,但与单卡速度相当,我用的2080ti的卡 3)我测试了30个epoch后的效果,识别精度很差 想请教下: 1)模型训练,需要多少个epoch才合适,初始lr,还有batchsize的大小 2)您在多卡下也是这么慢吗?有没有提升训练速度的方法 3)lsvt中弱标注的数据怎么使用呢,没有文字区域的坐标,如何做mask处理 多谢啦!!

你好,我使用过程中有两个问题请教一下:

  1. test.py过程中使用作者docker中的模型text_recognition5435.pb,在 = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33] 2.在train.py时报错 File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.