Closed soldierofhell closed 5 years ago
@soldierofhell icdar2015 and svhn need to change the arch of crnn part. i train icdar2015 5min for one epoch
Thanks @novioleo, could you provide some hints what should be changed? Actually in general the crnn.py seems like a pretty standard crnn architecture
the height of last feature map.and the number of the hidden units of bilstm and other crnn refine tricks..
---Original--- From: "soldierofhell"notifications@github.com Date: 2019/6/25 15:12:44 To: "novioleo/FOTS"FOTS@noreply.github.com; Cc: "Mention"mention@noreply.github.com;"Tao Luo"744351893@qq.com; Subject: Re: [novioleo/FOTS] How fast should the network learn? (#15)
Thanks @novioleo, could you provide some hints what should be changed? Actually in general the crnn.py seems like a pretty standard crnn architecture
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Ok, but I'm not sure if it could be the cause. Actually for SVHN I decreased units to 256 (didn't help). From my previous experience with CRNN such "tuning" didn't change general behaviour. But here we have "heavier" resnet, roi rotate in the middle and quite new pytorch implementation of ctc (compared to tf.ctc_loss()), lots of degrees of freedom :) Ok, seems like I have to dig into :)
fare well @soldierofhell
@novioleo, I reduced the model to include only detection part and only classification loss and noticed that the dice loss doesn't improve. So I looked into shared_conv.py and noticed that:
In __mean_image_substraction the means are from [0,255] like in tf.slim implementation of EAST (https://github.com/argman/EAST/blob/master/model.py), while PyTorch torchvision is using [0,1] normalization with mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] (see e.g. https://pytorch.org/docs/stable/torchvision/models.html)
Image inputs are directly injected into each layer (layer1, ..., layer4):
input = layer(input)
Why? Why not just like in the EAST tf implementation:
logits, end_points = resnet_v1.resnet_v1_50(images, is_training=is_training, scope='resnet_v1_50')
@soldierofhell
torchvision.transform
looks.tensorflow version east
,https://github.com/argman/EAST/blob/dca414de39a3a4915a019c9a02c1832a31cdd0ca/nets/resnet_v1.py#L224
you can notice that there are blocks outputs are contain in endpoints
.
there is no much difference between two implements.Regarding 1. I still think there's a problem here. In addition I don't see any conversion from BGR to RGB format after cv2.imread()? Probably it would be better if I prepare PR to this to explain it better
just convert use img[:,:,::-1].there is need not use convert function
---Original--- From: "soldierofhell"notifications@github.com Date: 2019/6/27 22:35:24 To: "novioleo/FOTS"FOTS@noreply.github.com; Cc: "Mention"mention@noreply.github.com;"Tao Luo"744351893@qq.com; Subject: Re: [novioleo/FOTS] How fast should the network learn? (#15)
You are right. I didn't understand the flow of the for loop at first. It iterates through the whole resnet starting from the input to the output of layer4 so it's fine :)
Regarding 1. I still think there's a problem here. In addition I don't see any conversion from BGR to RGB format after cv2.imread()? Probably it would be better if I prepare PR to this to explain it better
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I tried to train ICDAR 2015 and SVHN (as "mydataset") and for both datasets network quickly starts to predict all blanks (which is rather typical for CRNN), but the loss (especially CTC) doesn't improve and I mostly observe blanks with some minor fluctuations from time to time. The only thing that I've changed was running the code on PyTorch 1.1 and I moved the labels and label_lengths tensors to CUDA (required by CTCLoss), but from other issues it seems like it shouldn't be a problem. I've also set lower batch_size=8, because of memory limitations (can affect batch_norm() but rather not completely breaks it)