novioleo / FOTS

implement FOTS and apply it on real scenario.
68 stars 23 forks source link

How fast should the network learn? #15

Closed soldierofhell closed 5 years ago

soldierofhell commented 5 years ago

I tried to train ICDAR 2015 and SVHN (as "mydataset") and for both datasets network quickly starts to predict all blanks (which is rather typical for CRNN), but the loss (especially CTC) doesn't improve and I mostly observe blanks with some minor fluctuations from time to time. The only thing that I've changed was running the code on PyTorch 1.1 and I moved the labels and label_lengths tensors to CUDA (required by CTCLoss), but from other issues it seems like it shouldn't be a problem. I've also set lower batch_size=8, because of memory limitations (can affect batch_norm() but rather not completely breaks it)

  1. What is the "reference" training speed for ICDAR 2015? In particular, after how many epochs should we expect reasonable predictions?
  2. Does anyone faced problems with convergence?
novioleo commented 5 years ago

@soldierofhell icdar2015 and svhn need to change the arch of crnn part. i train icdar2015 5min for one epoch

soldierofhell commented 5 years ago

Thanks @novioleo, could you provide some hints what should be changed? Actually in general the crnn.py seems like a pretty standard crnn architecture

novioleo commented 5 years ago

the height of last feature map.and the number of the hidden units of bilstm and other crnn refine tricks..

---Original--- From: "soldierofhell"notifications@github.com Date: 2019/6/25 15:12:44 To: "novioleo/FOTS"FOTS@noreply.github.com; Cc: "Mention"mention@noreply.github.com;"Tao Luo"744351893@qq.com; Subject: Re: [novioleo/FOTS] How fast should the network learn? (#15)

Thanks @novioleo, could you provide some hints what should be changed? Actually in general the crnn.py seems like a pretty standard crnn architecture

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

soldierofhell commented 5 years ago

Ok, but I'm not sure if it could be the cause. Actually for SVHN I decreased units to 256 (didn't help). From my previous experience with CRNN such "tuning" didn't change general behaviour. But here we have "heavier" resnet, roi rotate in the middle and quite new pytorch implementation of ctc (compared to tf.ctc_loss()), lots of degrees of freedom :) Ok, seems like I have to dig into :)

novioleo commented 5 years ago

fare well @soldierofhell

soldierofhell commented 5 years ago

@novioleo, I reduced the model to include only detection part and only classification loss and noticed that the dice loss doesn't improve. So I looked into shared_conv.py and noticed that:

  1. In __mean_image_substraction the means are from [0,255] like in tf.slim implementation of EAST (https://github.com/argman/EAST/blob/master/model.py), while PyTorch torchvision is using [0,1] normalization with mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] (see e.g. https://pytorch.org/docs/stable/torchvision/models.html)

  2. Image inputs are directly injected into each layer (layer1, ..., layer4): input = layer(input) Why? Why not just like in the EAST tf implementation: logits, end_points = resnet_v1.resnet_v1_50(images, is_training=is_training, scope='resnet_v1_50')

novioleo commented 5 years ago

@soldierofhell

  1. you can alway change the function to torchvision.transform looks.
  2. in the tensorflow version east,https://github.com/argman/EAST/blob/dca414de39a3a4915a019c9a02c1832a31cdd0ca/nets/resnet_v1.py#L224 you can notice that there are blocks outputs are contain in endpoints. there is no much difference between two implements.
soldierofhell commented 5 years ago
  1. You are right. I didn't understand the flow of the for loop at first. It iterates through the whole resnet starting from the input to the output of layer4 so it's fine :)

Regarding 1. I still think there's a problem here. In addition I don't see any conversion from BGR to RGB format after cv2.imread()? Probably it would be better if I prepare PR to this to explain it better

novioleo commented 5 years ago

just convert use img[:,:,::-1].there is need not use convert function

---Original--- From: "soldierofhell"notifications@github.com Date: 2019/6/27 22:35:24 To: "novioleo/FOTS"FOTS@noreply.github.com; Cc: "Mention"mention@noreply.github.com;"Tao Luo"744351893@qq.com; Subject: Re: [novioleo/FOTS] How fast should the network learn? (#15)

You are right. I didn't understand the flow of the for loop at first. It iterates through the whole resnet starting from the input to the output of layer4 so it's fine :)

Regarding 1. I still think there's a problem here. In addition I don't see any conversion from BGR to RGB format after cv2.imread()? Probably it would be better if I prepare PR to this to explain it better

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.