Open zjz5250 opened 4 years ago
Did you select your gpu at config file? It should be around 10-15 mins per epoch. @zjz5250
@ etatbak Thank you! yes,i use gpu,set gpus = [0] in config.py. and how many steps of one epoch when you train the model. I found that it cost a lot of time when read images in every step. I set batchsize as 16, and it need about 2 seconds when read 16 images.
@etatbak do you change the steps_per_epoch's value? the default value is 1500, but actually it should be a big number。 for example,if the total number of the training set is 16000,bachsize is 16,then the steps_per_epoch should be 1000,am I right?
I use the lstv dataset,the total number is 238790,I set batchsize as 16,so the steps_per_epoch is 14924. when I train the model, I found that one epoch need about 6 hours。what is worse,after 11 epoches,the model can not work at all。
@zjz5250 I didn't change many parameters. But I only used rects dataset, so I think if I use lstv it will also take longer. Step_per_epoch is 500 I think so. My batch_size is 10. I trained 1000 epochs but it doesn't work well, even not at average, I am not sure how to improve the performance.
@ etatbak
did you use bp file transform from your new model,when you test the accuracy?
“You must feed a value for placeholder tensor 'label' with dtype int32 and shape [?,33]” did you meet this problem? and how to fix it
@zjz5250 @etatbak @zhang0jhon Hi, I used all ReCTS, ArT, LSVT and IC2017MLT data and trained for 5 epochs on a single GPU (takes a day). I got training loss around 2 but very high validation loss. Do you have any idea on this?
@zhang0jhon Could you please share what level of training and validation loss did you get with the final model? Thanks!
@zjz5250 您好,我训练的时候报错,没有icdar_datasets.npy,您方便把这个文件发到我的邮箱 zhou19920226@126.com给我吗,感激不尽.
@zhang0jhon Hello, thank you for sharing the codes. I fail to train the model, can you send me the icdar_datasets.npy to my email: zhou19920226@126.com ? Thank you very much.
@ustczhouyu Hi, you will need to run dataset.py first to generate the npy file
I got a validation loss around 1.3. The model can recognize some part of the text but the overall accuracy is relatively poor. I checked the pretrained recognition model has a loss around 0.5 so that should be the goal.
@zhang0jhon 博主您好,首先特别感谢您做的工作,您开源的模型,效果确实很好。 我想尝试复现一下训练流程,但遇到如下3个问题: 1)速度特别慢,我只用了LSVT的数据,一个epoch都要大约6个小时 2)我尝试用多卡训练,但与单卡速度相当,我用的2080ti的卡 3)我测试了30个epoch后的效果,识别精度很差 想请教下: 1)模型训练,需要多少个epoch才合适,初始lr,还有batchsize的大小 2)您在多卡下也是这么慢吗?有没有提升训练速度的方法 3)lsvt中弱标注的数据怎么使用呢,没有文字区域的坐标,如何做mask处理 多谢啦!!
我觉得应该改变读取数据的方式,我看作者的数据读取方式是将整个图像load,这太慢了,我准备改一下改成load裁剪之后的图像
@zhang0jhon 博主您好,首先特别感谢您做的工作,您开源的模型,效果确实很好。 我想尝试复现一下训练流程,但遇到如下3个问题: 1)速度特别慢,我只用了LSVT的数据,一个epoch都要大约6个小时 2)我尝试用多卡训练,但与单卡速度相当,我用的2080ti的卡 3)我测试了30个epoch后的效果,识别精度很差 想请教下: 1)模型训练,需要多少个epoch才合适,初始lr,还有batchsize的大小 2)您在多卡下也是这么慢吗?有没有提升训练速度的方法 3)lsvt中弱标注的数据怎么使用呢,没有文字区域的坐标,如何做mask处理 多谢啦!!
你好,我使用过程中有两个问题请教一下:
@zhang0jhon 博主您好,首先特别感谢您做的工作,您开源的模型,效果确实很好。 我想尝试复现一下训练流程,但遇到如下3个问题: 1)速度特别慢,我只用了LSVT的数据,一个epoch都要大约6个小时 2)我尝试用多卡训练,但与单卡速度相当,我用的2080ti的卡 3)我测试了30个epoch后的效果,识别精度很差 想请教下: 1)模型训练,需要多少个epoch才合适,初始lr,还有batchsize的大小 2)您在多卡下也是这么慢吗?有没有提升训练速度的方法 3)lsvt中弱标注的数据怎么使用呢,没有文字区域的坐标,如何做mask处理 多谢啦!!