ESPNet training - Githubissues

sacmehta / ESPNet

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

https://sacmehta.github.io/ESPNet/

MIT License

538 stars 111 forks source link

ESPNet training #60

Closed pkuqgg closed 5 years ago

pkuqgg commented 5 years ago

Hello,Thank you for your impressive work! I know the training is followed by two steps.When I set the variable 'decoder' to 'True',there is a problem with '

RuntimeError: input and target batch or spatial sizes don't match: target [8 x 96 x 192], input [8 x 20 x 768 x 1536] at /opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:23' Please help me solve doubts.Thank you very much!

sacmehta commented 5 years ago

What is the value of argument scaleIn ? For encoder, it is 8 and for decoder, it should be 1.

pkuqgg commented 5 years ago

What is the value of argument scaleIn ? For encoder, it is 8 and for decoder, it should be 1.

Thank you for your reply.This problem has been solved.May I ask another question?The epoch in ESPNet-C and ESPNet training are both 300.But it takse too many time. Do you have any suggestion to reduce the time? Hope for your reply. Thanks!

pkuqgg commented 5 years ago

Hello! I am try to retrain the net.Everything follows your advice,but there has a bug.Can you help me to solve it .Please, this is important for me. Thank you !

RuntimeError: Error(s) in loading state_dict for ESPNet:
        size mismatch for conv.conv.weight: copying a param with shape torch.Size([20, 39, 3, 3]) from checkpoint, the shape in current model is torch.Size([20, 36,3, 3]).

SimonsLiu commented 5 years ago

Hello! I am try to retrain the net.Everything follows your advice,but there has a bug.Can you help me to solve it .Please, this is important for me. Thank you !
RuntimeError: Error(s) in loading state_dict for ESPNet:
        size mismatch for conv.conv.weight: copying a param with shape torch.Size([20, 39, 3, 3]) from checkpoint, the shape in current model is torch.Size([20, 36,3, 3]).

I also suffered such a bug, have you solved this error? If it is solved, can you tell me the solution? please please please

sacmehta commented 5 years ago

Model file in train and test folders are different. The one in train folder is generic and can be applied to any dataset. The one in test folder is specific to Cityscapes dataset. Which file are you using?

SimonsLiu commented 5 years ago

火车和测试文件夹中的模型文件是不同的。列车文件夹中的一个是通用的，可以应用于任何数据集。测试文件夹中的一个特定于Cityscapes数据集。你使用的是哪个文件？

thank. my training process is like what you said in readme. first .i train esp-net-C then i train esp-net so i get two dir. like: now i want to test my model that i haved train . how can i do ?

sacmehta commented 5 years ago

Replace the model file in test folder with the one in train folder and then follow the instructions for testing. It should work

SimonsLiu commented 5 years ago

将test文件夹中的模型文件替换为train in train文件夹中的模型文件，然后按照说明进行测试。它应该工作 your meaning is that i used results_encdec_2_8/model_300.pth to be the espnet_p_2_q_8.pth and results_encenc_2_8/model_300.pth to be the espnet_p_2_q_8.pth ,and i can test my model?

sacmehta commented 5 years ago

I mean replace Model.py file in test folder with Model.py file in train folder and then use the weights that you trained.

SimonsLiu commented 5 years ago

solve it !!! thank you very much !!!

SimonsLiu commented 5 years ago

I mean replace Model.py file in test folder with Model.py file in train folder and then use the weights that you trained.

I just tested it. I trained 300 rounds of models and I feel it may be not the best. could you tell me what is the best parameter when you train? How accurate is it?

sacmehta commented 5 years ago

We used the default parameters for training. See accuracy details in paper or Cityscapes leaderboard.

P.S.: In case you are not aware of, ESPNetv2 is 6-7% more accurate than ESPNet and has about half the number of FLOPs and trains faster too. See below repo for more details.

https://github.com/sacmehta/EdgeNets