Closed wineternity closed 6 years ago
Here are some related issue previously, you may refer to them. Also, you could check your dataset out by dump some of them. It will not cause mistake.
Thanks, now I can train the dataset. And my dataset is train for variable length. I found in training new model the instruction said "please sort the image according to the text length."
train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=opt.batchSize, shuffle=True, sampler=sampler, num_workers=int(opt.workers), collate_fn=dataset.alignCollate(imgH=opt.imgH, imgW=opt.imgW, keep_ratio=opt.keep_ratio))
But from code, shuffle is always True. Where is the code need the dataset ordered by length. And what if implemented the code without ordered variable length, if the crnn network have limitation?
As from the source code of pytorch, you will see the option is not ignored only when the sampler is not specified.
Thanks so much for your kindly explanation, now I think I understand your design. If --random_sample is used, dataloader is use shuffle = true And without this flag, dataloader will use the sampler RandomSequentialSampler which will keep the order of the random data.
But I always need to change shuffle to False manually as pytorch did not support sample=None and shuffle=True together. Errors is like below, but this one may be relative to my pytorch version, not a big deal. collate_fn=dataset.alignCollate(imgH=opt.imgH, imgW=opt.imgW, keep_ratio=opt.keep_ratio)) File "/home/animal/Tool/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 287, in init raise ValueError('sampler is mutually exclusive with shuffle') ValueError: sampler is mutually exclusive with shuffle
At last , may I ask the order have influence on the cost or accuracy of network? As I run data with shuffled, the train dataset also have good output.
It might be the problem of pytorch's version.
As you can see from the collate function, they will be resized to the same size. So if the images from the same batch does not have similar ratio, they will be distorted. You could train it of course, but may deteriorate the performance.
I am newbie in DL, and try to demo the crnn_main.py script with 5000 train samples with batchsize=100 . The alphabet is all digits and a-zA-Z.
[22/25][20/50] Loss: 32.413400 Start val -------------------------- => , gt: 354141
-------------------------- => , gt: 342029
-------------------------- => , gt: 54806C
-------------------------- => , gt: 104101056
-------------------------- => , gt: 541001214
-------------------------- => , gt: 501461
-------------------------- => , gt: 466051021
-------------------------- => , gt: 322025144
-------------------------- => , gt: 203001658
-------------------------- => , gt: 530010012
Test loss: 27.896337, accuray: 0.000000 [22/25][30/50] Loss: 33.099444 [22/25][40/50] Loss: 29.278045 Start val 3------------------------- => 3 , gt: 324401100
3------------------------- => 3 , gt: 500214
3------------------------- => 3 , gt: 530010012
3------------------------- => 3 , gt: 466453
3------------------------- => 3 , gt: 447009023
3------------------------- => 3 , gt: 403051019
3------------------------- => 3 , gt: 930001623
3------------------------- => 3 , gt: 326000050
3------------------------- => 3 , gt: 661258
3------------------------- => 3 , gt: 422051*001
Test loss: 24.278810, accuray: 0.000000
May I ask why the result is so bad. it is possible that caused by the small size of train samplers or I made some mistakes for create sample image for train(I use PIL to draw the string on to a background image )?
Could you give a tip. And I have backspace in the image string Like" 100 100" and "100100" on images on map to the label 100100. will this cause mistake.