wcy940418 / CRNN-end-to-end

A CRNN python implement based on TensorFlow
MIT License
6 stars 5 forks source link

Cannot convergence using this code #1

Open moonist opened 7 years ago

moonist commented 7 years ago

I only changed the dataset, when training more than 30W steps, the network cannot convergence, the edit lengths are always large(almost 1). Any suggestions?

wcy940418 commented 7 years ago

@moonist I am debugging and testing this code now, even it looks like working properly. What hardware you used for training and how many hours you used? According to the author of paper, they spend 50 hours for training with K40.

moonist commented 7 years ago

@wcy940418 I'am using a GTX1070 GPU card for nearly 10 hours, but the accuracy(not the printed edit distance) remains zero. When I use torch to train the same model, the accuracy is 91% after 20 hours, when using caffe, the accuracy will also be a positive value. Have you had a successful training? maybe a pretrained model will be helpful.

plus. I wonder if this helps network convergence, https://github.com/OlavHN/bnlstm.git, which adds batch normalization in lstm

codeVerySlow commented 7 years ago

@moonist I have the same situation.I'am using a GTX1080TI GPU for nearly 40hours,the accuracy is 0.000000,but the loss is nearly 0.001(the loss is 20 at the start).

ghost commented 7 years ago

Hi, I was confused by the following code: images = np.add(images, -128.0) Why does the image subtract 128?

wcy940418 commented 7 years ago

@songwendong I think it is because the author wanted to remap 8bit gray scale image to float number [-1, 1]. I just followed the original implementation, but this code does not work at all.

liu6381810 commented 7 years ago

Hi @wcy940418
你应该是中国人吧,可以说汉语吧? 我用了你的代码,卷积层自己重新定义的 lstm和ctc根据你的这个来做,发现完全不收敛。。loss起初17变成16后就不再下降了, 有没有什么建议呢。。还是哪里有问题

wcy940418 commented 7 years ago

@liu6381810 你好,抱歉这个代码完全不工作。我是完全按照crnn的来做的,但是debug了很长时间完全不知道是为什么不工作。根据我的猜测,可能是因为有以下问题造成不工作的:

  1. ctc loss的问题。他们组另一拨人复现的pytorch代码使用的是baidu的ctc,我也用了,但是可能是api问题,完全不工作。
  2. pretrain问题。我记得原作者没有说模型pretrain过,而且说是end-to-end就可以train,但是我觉得前面的卷积层还是需要pretrain的吧,不然feature提取不出来。通常来说正常的cnn模型输出结果应该能用肉眼分别一些特征的,但是这个我看了中间结果(我的代码),完全看不出来,几乎全黑。
  3. 他们的pytorch用的也是torch的weight,所以我就不确定他们有没有真的用pytorchtrain过。 理论上来说,torch代码能实现的,tf肯定也能实现,就是有一些trick可能我们不知道。 祝你好运,有什么问题再交流
BigBorg commented 6 years ago

我用的pytorch, resnet 预训练的CNN加lstm,用ctc一样是loss到16左右就不再降了,毫无头绪。