taoyang1122 / pytorch-SimSiam

A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.
Apache License 2.0
78 stars 8 forks source link

Top1 Acc on ImageNet #1

Open Xiatian-Zhu opened 3 years ago

Xiatian-Zhu commented 3 years ago

Thanks @taoyang1122 for sharing this great repo.

For the result of 67.8% you got for linear classification on pre-trained feature model by SimSiam, is it by the last (90-th) epoch model or the best epoch model?

taoyang1122 commented 3 years ago

It is the last epoch model.

Xiatian-Zhu commented 3 years ago

Interesting, what I got is only 65.316% (1 trial). I do not think I changed anything. Can you repeat your results at your end, @taoyang1122 ? Thanks.

taoyang1122 commented 3 years ago

I can repeat the results. Can you report your setting and environment? And are you using my provided pre-trained model to do linear evaluation or you first do unsupervised pretraining then do linear evaluation? If you do first do unsupervised pretraining, maybe you can try directly do linear evaluation with my provided model to see which stage went wrong.

Xiatian-Zhu commented 3 years ago

Thanks for the response. I did both simsiam pretraining and linear classifier trainining at my side. Indeed, it is a good idea to use your pretrained model. I will do.

The below is the config I used. I use the DP version instead of DDP version which I failed to get work on my machine. Not sure if this is the problem.

  | python3 main_simsiam_DP.py \   | --aug-plus \   | --cos \   | -a resnet50 \   | --lr 0.1 \   | -p 100 \   | --epochs 100 \   | --batch-size 512 \   | # --dist-url 'tcp://localhost:10001' \   | # --multiprocessing-distributed \   | # --world-size 1 \   | # --rank 0 \

Xiatian-Zhu commented 3 years ago

For training the linear classification, I used the same config as yours (again I commented DDP):

python3 main_lincls.py \   | -a resnet50 \   | --lr 1.6 \   | --cos \   | --epochs 90 \   | --batch-size 4096 \   | -p 100 \   | --pretrained /gpfs-volume/train_logs/simsiam/simsiam_checkpoint_0099.pth.tar \   | # --dist-url 'tcp://localhost:10001' \   | # --multiprocessing-distributed \   | # --world-size 1 \   | # --rank 0 \

taoyang1122 commented 3 years ago

Sorry, although I implemented the DP version, I didn't really test its performance. The paper says it uses syncBN so maybe DP will cause some issues.

I will remove the DP version, sorry for the confusion. You can try to get the DDP working and I think that should reproduce the results.

Xiatian-Zhu commented 3 years ago

I see. Thanks for letting me know this. It is great to know the reason. I will check the DDP version again.

Xiatian-Zhu commented 3 years ago

DDP version is still a problem for my machine. I often got this error. @taoyang1122 Do you have this issue before? or there are special requirements for cuda driver and package version? Thanks!

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/home/user/SimSiam_ImageNet/mlp_main_simsiamddp.py", line 300, in main worker train(train_loader, model, optimizer, epoch, args) File "/home/user/SimSiam_ImageNet/mlp_main_simsiam_ddp.py", line 340, in train z1, p1 = model(images[0]) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.7/site-packages/torch/nn/parallel/distributed.py" , line 442, in forward output = self.module(*inputs[0], *kwargs[0]) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/home/user/SimSiam_ImageNet/models/simsiam.py", line 77, in forward z = self.projection(x) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, kwargs) File "/home/user/SimSiam_ImageNet/models/simsiam.py", line 27, in forward x = self.l1(x) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", l ine 92, in forward input = module(input) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", l ine 429, in forward self._check_input_dim(input) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", l ine 417, in _check_input_dim .format(input.dim())) ValueError: expected at least 3D input (got 2D input)

taoyang1122 commented 3 years ago

No, I don't have this issue. Seems that there is some problem with the input dimension, maybe you can print the input dimension to the encoder and input dimension to the projection MLP to see if it is reasonable.

My Pytorch is 1.7.1, CUDA=11.1, python=3.7, torchvision=0.8.2

Xiatian-Zhu commented 3 years ago

Thanks for giving the config of your machine. Mine is: torch=1.3.0, CUDA=10.1.243, python=2.7.12, torchvision=0.4.1 Quite away from your config.

As you suggested, I print the input shape for the project MLP and it is indeed 2D: batch x feat_dim (2048). It is also the same 2D shape for the DP version. The input looks fine to me. This may be caused by different PyTorch versions. Thanks still!

Xiatian-Zhu commented 3 years ago

Hi @taoyang1122 I try to use the simsiam model you shared in google drive, named unsuperivsed_petrained.tar. Is it the last epoch output? I cannot load it. Any preprocess needed?

Error is below: checkpoint = torch.load(args.pretrained, map_location="cpu")

File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 426, in load

return _load(f, map_location, pickle_module, **pickle_load_args)

File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 599, in _load

raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))

RuntimeError: /gpfs-volume/train_logs/simsiam/taoyang_pretrained.tar is a zip archive (did you mean to use torch.jit.load()?)

taoyang1122 commented 3 years ago

I can try to rename it to unsupervised_pretrained.pth.tar.

Xiatian-Zhu commented 3 years ago

After renamed to .pth.tar at my side, I still cannot load it with the same error.

taoyang1122 commented 3 years ago

That's weird, I don't have such issue. I don't know if it caused by the pytorch version.

Xiatian-Zhu commented 3 years ago

Sorry. The problem may be at my side. I can load it on another server. Thanks.

Xiatian-Zhu commented 3 years ago

@taoyang1122 With the pre-trained model you provided and the same parameters (below), I can reach a very similar result for linear evaluation: 67.778%. :-)

| -a resnet50 \   | --lr 1.6 \   | --cos \   | --epochs 90 \   | --batch-size 4096 \   | -p 100 \   | --pretrained ./taoyang_pretrained.pth.tar \   | --dist-url 'tcp://localhost:10001' \   | --multiprocessing-distributed \   | --world-size 1 \   | --rank 0 \

taoyang1122 commented 3 years ago

Great! You can try to fix the DDP issue and that should be able to reproduce the results from scratch.

fiona-lxd commented 3 years ago

How much time did it cost for reproducing this result on ImageNet?