zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 452 forks source link

RuntimeError: Failed downloading #273

Closed Monibsediqi closed 4 years ago

Monibsediqi commented 4 years ago

Hi HangZhang! First of all, many thanks for the awesome work I'm trying to train the network but on download stage I'm constantly getting the following error. RuntimeError: Failed downloading url https://hangzh.s3.amazonaws.com/encoding/models/resnet50-25c4b509.zip

I tried wget https://hangzh.s3.amazonaws.com/encoding/models/resnet50-25c4b509.zip

and got this: Resolving hangzh.s3.amazonaws.com (hangzh.s3.amazonaws.com)... 52.219.116.146 Connecting to hangzh.s3.amazonaws.com (hangzh.s3.amazonaws.com)|52.219.116.146|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2020-05-06 11:33:10 ERROR 403: Forbidden.

I dunno where's the problem. Is it on my side or on the side of amazon web service?

zhanghang1989 commented 4 years ago

Looks like you are using older version of this toolkit. Pretrained model for resnet50 is no longer provided. Please visit https://github.com/zhanghang1989/gluoncv-torch if you are interested in

sulc commented 4 years ago

The current download URL from encoding/models/model_store.py doesn't work.

I tried changing the URL in model_store.py according to the model_store.py from gluoncv-totch, but the model seems to be slightly different, it fails on size mismatch:

RuntimeError: Error(s) in loading state_dict for ResNet:
        size mismatch for bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
        size mismatch for layer1.0.downsample.0.weight: copying a param with shape torch.Size([256, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
Monibsediqi commented 4 years ago

The current download URL from encoding/models/model_store.py doesn't work.

I tried changing the URL in model_store.py according to the model_store.py from gluoncv-totch, but the model seems to be slightly different, it fails on size mismatch:

RuntimeError: Error(s) in loading state_dict for ResNet:
        size mismatch for bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
        size mismatch for layer1.0.downsample.0.weight: copying a param with shape torch.Size([256, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).

I faced the same issue. I though it's a problem that's occurring only to me. I'm still trying to get the code run on my machine

zhanghang1989 commented 4 years ago

Changing the URL to https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnest50-fb9de5b3.zip should work.

zhanghang1989 commented 4 years ago

Let me know if you still have the issue using newest version.

Monibsediqi commented 4 years ago

Thank you so much for the updates. As per your instructions I prepared the dataset uisng scripts/prepare-ade20k.py and the tried to run the test.py script to test the model, but got the following error: Traceback (most recent call last): File "test.py", line 199, in test(args) File "test.py", line 114, in test collate_fn=test_batchify_fn, **loader_kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 219, in init batch_sampler = BatchSampler(sampler, batch_size, drop_last) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 190, in init "but got batch_size={}".format(batch_size)) ValueError: batch_size should be a positive integer value, but got batch_size=0

FYI: I checked the url ~.encoding/data to make sure the dataset exists, and it does.

I also tried the train.py script, and got the following error:

Using poly LR scheduler with warm-up epochs of 0! Starting Epoch: 0 Total Epoches: 180 0%| | 0/1263 [00:00<?, ?it/s] =>Epoch 0, learning rate = 0.0040, previous best = 0.0000 0%| | 0/1263 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 271, in trainer.training(epoch) File "train.py", line 202, in training outputs = self.model(image) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/models/sseg/encnet.py", line 33, in forward features = self.base_forward(x) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/models/sseg/base.py", line 91, in base_forward x = self.pretrained.bn1(x) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, **kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/nn/syncbn.py", line 180, in forward if x.get_device() == self.devices[0]: IndexError: list index out of range

any comment is highly appreciated

hongjianyuan commented 4 years ago

https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnest50-fb9de5b3.zip This URL seems to be invalid

zhanghang1989 commented 4 years ago

https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnest50-fb9de5b3.zip This URL seems to be invalid

Please update the package:

pip install torch-encoding --pre --upgrade
Msquitttto commented 3 years ago

Sorry the link is invalid RuntimeError: Failed downloading url https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnet50-fb9de5b3.zip I have update the package

zhanghang1989 commented 3 years ago

Where did you get this URL?

zhanghang1989 commented 3 years ago

All URL should start with https://s3.us-west-1.wasabisys.com/encoding

https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/models/model_store.py#L46

wlj567 commented 2 years ago

Thank you so much for the updates. As per your instructions I prepared the dataset uisng scripts/prepare-ade20k.py and the tried to run the test.py script to test the model, but got the following error: Traceback (most recent call last): File "test.py", line 199, in test(args) File "test.py", line 114, in test collate_fn=test_batchify_fn, loader_kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 219, in init batch_sampler = BatchSampler(sampler, batch_size, drop_last) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 190, in init** "but got batch_size={}".format(batch_size)) ValueError: batch_size should be a positive integer value, but got batch_size=0

FYI: I checked the url ~.encoding/data to make sure the dataset exists, and it does.

I also tried the train.py script, and got the following error:

Using poly LR scheduler with warm-up epochs of 0! Starting Epoch: 0 Total Epoches: 180 0%| | 0/1263 [00:00<?, ?it/s] =>Epoch 0, learning rate = 0.0040, previous best = 0.0000 0%| | 0/1263 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 271, in trainer.training(epoch) File "train.py", line 202, in training outputs = self.model(image) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/models/sseg/encnet.py", line 33, in forward features = self.base_forward(x) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/models/sseg/base.py", line 91, in base_forward x = self.pretrained.bn1(x) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call* result = self.forward(input, **kwargs) File "/home/monib/.conda/envs/encoding3/lib/python3.7/site-packages/encoding/nn/syncbn.py", line 180, in forward if x.get_device() == self.devices[0]: IndexError: list index out of range

any comment is highly appreciated

Hello, have you solved the two problems you mentioned? Can you help me answer it?