Closed Monibsediqi closed 4 years ago
Looks like you are using older version of this toolkit. Pretrained model for resnet50 is no longer provided. Please visit https://github.com/zhanghang1989/gluoncv-torch if you are interested in
The current download URL from encoding/models/model_store.py doesn't work.
I tried changing the URL in model_store.py according to the model_store.py from gluoncv-totch, but the model seems to be slightly different, it fails on size mismatch:
RuntimeError: Error(s) in loading state_dict for ResNet:
size mismatch for bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
size mismatch for layer1.0.downsample.0.weight: copying a param with shape torch.Size([256, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
The current download URL from encoding/models/model_store.py doesn't work.
I tried changing the URL in model_store.py according to the model_store.py from gluoncv-totch, but the model seems to be slightly different, it fails on size mismatch:
RuntimeError: Error(s) in loading state_dict for ResNet: size mismatch for bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]). size mismatch for layer1.0.downsample.0.weight: copying a param with shape torch.Size([256, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
I faced the same issue. I though it's a problem that's occurring only to me. I'm still trying to get the code run on my machine
Changing the URL to https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnest50-fb9de5b3.zip
should work.
Let me know if you still have the issue using newest version.
Thank you so much for the updates.
As per your instructions I prepared the dataset uisng scripts/prepare-ade20k.py and the tried to run the test.py script to test the model, but got the following error:
Traceback (most recent call last):
File "test.py", line 199, in
FYI: I checked the url ~.encoding/data to make sure the dataset exists, and it does.
I also tried the train.py script, and got the following error:
Using poly LR scheduler with warm-up epochs of 0!
Starting Epoch: 0
Total Epoches: 180
0%| | 0/1263 [00:00<?, ?it/s]
=>Epoch 0, learning rate = 0.0040, previous best = 0.0000
0%| | 0/1263 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 271, in
any comment is highly appreciated
https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnest50-fb9de5b3.zip This URL seems to be invalid
https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnest50-fb9de5b3.zip This URL seems to be invalid
Please update the package:
pip install torch-encoding --pre --upgrade
Sorry the link is invalid RuntimeError: Failed downloading url https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnet50-fb9de5b3.zip I have update the package
Where did you get this URL?
All URL should start with https://s3.us-west-1.wasabisys.com/encoding
https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/models/model_store.py#L46
Thank you so much for the updates. As per your instructions I prepared the dataset uisng scripts/prepare-ade20k.py and the tried to run the test.py script to test the model, but got the following error: Traceback (most recent call last): File "test.py", line 199, in test(args) File "test.py", line 114, in test collate_fn=test_batchify_fn, loader_kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 219, in init batch_sampler = BatchSampler(sampler, batch_size, drop_last) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 190, in init** "but got batch_size={}".format(batch_size)) ValueError: batch_size should be a positive integer value, but got batch_size=0
FYI: I checked the url ~.encoding/data to make sure the dataset exists, and it does.
I also tried the train.py script, and got the following error:
Using poly LR scheduler with warm-up epochs of 0! Starting Epoch: 0 Total Epoches: 180 0%| | 0/1263 [00:00<?, ?it/s] =>Epoch 0, learning rate = 0.0040, previous best = 0.0000 0%| | 0/1263 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 271, in trainer.training(epoch) File "train.py", line 202, in training outputs = self.model(image) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/models/sseg/encnet.py", line 33, in forward features = self.base_forward(x) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/models/sseg/base.py", line 91, in base_forward x = self.pretrained.bn1(x) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call* result = self.forward(input, **kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/nn/syncbn.py", line 180, in forward if x.get_device() == self.devices[0]: IndexError: list index out of range
any comment is highly appreciated
Hello, have you solved the two problems you mentioned? Can you help me answer it?
Hi HangZhang! First of all, many thanks for the awesome work I'm trying to train the network but on download stage I'm constantly getting the following error. RuntimeError: Failed downloading url https://hangzh.s3.amazonaws.com/encoding/models/resnet50-25c4b509.zip
I tried wget https://hangzh.s3.amazonaws.com/encoding/models/resnet50-25c4b509.zip
and got this: Resolving hangzh.s3.amazonaws.com (hangzh.s3.amazonaws.com)... 52.219.116.146 Connecting to hangzh.s3.amazonaws.com (hangzh.s3.amazonaws.com)|52.219.116.146|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2020-05-06 11:33:10 ERROR 403: Forbidden.
I dunno where's the problem. Is it on my side or on the side of amazon web service?