nextpyp / cet_pick

Cellular content mining and particle localization
https://nextpyp.app/milopyp/
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

MiLoPYP error loading model #9

Open abnerabner opened 3 weeks ago

abnerabner commented 3 weeks ago

Hi, when I followed the tutorial and ran the code

python simsiam_main.py simsiam2d3d --num_epochs 20 --exp_id test_sample --bbox 36 --dataset simsiam2d3d --arch simsiam2d3d_18 --lr 1e-3 --train_img_txt sample_train_explore_img.txt --batch_size 256 --val_intervals 20 --save_all --gauss 0.8 --dog 3,5

The following error occured

Traceback (most recent call last):
  File "simsiam_main.py", line 170, in <module>
    main(opt)
  File "simsiam_main.py", line 60, in main
    model = create_model(opt.arch, opt.heads, opt.head_conv)
  File "/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/models/model.py", line 69, in create_model
    model = get_model(num_layers=num_layers, heads=heads, head_conv=head_conv,last_k = last_k)
  File "/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/models/networks/simsiam_model_2d3d.py", line 855, in get_simsiam2d3d_net_small
    model.init_weights(num_layers)
  File "/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/models/networks/simsiam_model_2d3d.py", line 800, in init_weights
    pretrained_state_dict = model_zoo.load_url(url)
  File "/home/zhenglin/.conda/envs/MiLoPYP/lib/python3.8/site-packages/torch/hub.py", line 595, in load_state_dict_from_url
    return torch.load(cached_file, map_location=map_location)
  File "/home/zhenglin/.conda/envs/MiLoPYP/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/zhenglin/.conda/envs/MiLoPYP/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

Do you have any suggestion?

Best, Abner.

huangqinwen commented 3 weeks ago

what's the torch version?

huangqinwen commented 3 weeks ago

seems to be another pickle problem, one potential reason is the downloaded weights are not complete so you might want to redownload it, see heretorch issue 1, and heretorch issue 2

abnerabner commented 3 weeks ago

what's the torch version?

torch==1.11.0+cu102 torchvision==0.12.0+cu102 torchaudio==0.11.0+cu102

Installed by using pip through their .whl file

huangqinwen commented 3 weeks ago

can you check if there's anything in ~/.cache/torch/checkpoints ?

abnerabner commented 3 weeks ago

~/.cache/torch/checkpoints

There is a file resnet18-5c106cde.pth in ~/.cache/torch/hub/checkpoints

abnerabner commented 3 weeks ago

I remove the file resnet18-5c106cde.pth like torch issue1, then run the code again. It shows it have download it 100% but the following is the same error

(MiLoPYP) zhenglin@guilab:/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick$ python simsiam_main.py simsiam2d3d --num_epochs 20 --exp_id test_sample --bbox 36 --dataset simsiam2d3d --arch simsiam2d3d_18 --lr 1e-3 --train_img_txt sample_train_explore_img.txt --batch_size 256 --val_intervals 20 --save_all --gauss 0.8 --dog 3,5
Using tensorboardX
Fix size testing.
No validation files but validation interval is greater than 1...using training files for validation
Training chunk_sizes: [256]
The output will be saved to  /data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/exp/simsiam2d3d/test_sample
opt.world -1
distributed False
heads {'proj': 128, 'pred': 128}
Namespace(K=200, arch='simsiam2d3d_18', batch_size=256, bbox=36, chunk_sizes=[256], cluster_head=False, compress=False, contrastive=False, cosine=False, cr_weight=0.1, curvature_cutoff=0.003, cutoff_z=10, data_dir='/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/data', dataset='simsiam2d3d', debug=4, debug_dir='/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/exp/simsiam2d3d/test_sample/debug', debugger_theme='white', dist_backend='nccl', dist_url='env://', distance_cutoff=15, distance_scale=2, distributed=False, dog=[3.0, 5.0], down_ratio=2, exp_dir='/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/exp/simsiam2d3d', exp_id='test_sample', fiber=False, fix_res=True, gauss=0.8, ge=False, gpus=[0], gpus_str='0', head_conv=128, heads={'proj': 128, 'pred': 128}, hide_data_time=False, input_h=256, input_res=256, input_w=256, keep_res=False, last_k=3, load_model='', local_rank=-1, lr=0.001, lr_decay_rate=0.1, lr_step=[200, 400, 600], master_batch_size=256, metric='loss', names=None, nclusters=3, nheads=1, nms=3, not_cuda_benchmark=False, not_prefetch_test=False, num_classes=1, num_epochs=20, num_iters=-1, num_stacks=1, num_workers=0, order='xzy', out_id='output', out_path='/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/exp/simsiam2d3d/test_sample/output', out_thresh=0.25, output_h=128, output_res=128, output_w=128, pad=31, pn=False, pretrain_model='', print_iter=0, r2_cutoff=30, rank=-1, resume=False, root_dir='/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick', save_all=True, save_dir='/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/exp/simsiam2d3d/test_sample', seed=317, spike=False, task='simsiam2d3d', tau=0.1, temp=0.07, test=False, test_coord_txt='test_coords.txt', test_img_txt='test_images.txt', thresh=0.5, train_coord_txt='train_coords.txt', train_img_txt='sample_train_explore_img.txt', trainval=False, translation_ratio=0.5, val_coord_txt='train_coords.txt', val_img_txt='sample_train_explore_img.txt', val_intervals=20, vis_thresh=0.3, warm=False, with_score=False, world_size=-1)
Creating model...
Downloading: "http://download.pytorch.org/models/resnet18-5c106cde.pth" to /home/zhenglin/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 812/812 [00:00<00:00, 4.60MB/s]
Traceback (most recent call last):
  File "simsiam_main.py", line 170, in <module>
    main(opt)
  File "simsiam_main.py", line 60, in main
    model = create_model(opt.arch, opt.heads, opt.head_conv)
  File "/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/models/model.py", line 69, in create_model
    model = get_model(num_layers=num_layers, heads=heads, head_conv=head_conv,last_k = last_k)
  File "/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/models/networks/simsiam_model_2d3d.py", line 855, in get_simsiam2d3d_net_small
    model.init_weights(num_layers)
  File "/data1/zhenglin/test/template_picking_test/milopyp/cet_pick-main/cet_pick/models/networks/simsiam_model_2d3d.py", line 800, in init_weights
    pretrained_state_dict = model_zoo.load_url(url)
  File "/home/zhenglin/.conda/envs/MiLoPYP/lib/python3.8/site-packages/torch/hub.py", line 595, in load_state_dict_from_url
    return torch.load(cached_file, map_location=map_location)
  File "/home/zhenglin/.conda/envs/MiLoPYP/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/zhenglin/.conda/envs/MiLoPYP/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
huangqinwen commented 3 weeks ago

my guess it is from changing https to http. See here Instead of directly changing https, can you try to add this code to simsiam_main.py

import ssl

ssl._create_default_https_context = ssl._create_stdlib_context
abnerabner commented 3 weeks ago

my guess it is from changing https to http. See here Instead of directly changing https, can you try to add this code to simsiam_main.py

import ssl

ssl._create_default_https_context = ssl._create_stdlib_context

I remove and then clone the git again to make sure all the 'http' become back to 'https'. After that I add this code into simsiam_main.py, then the same error in issue8 occurs again, then I try to add it to all the .py file which use 'https' but they still don't work with the same error :sob:

urllib.error.URLError: <urlopen error [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1149)>

huangqinwen commented 3 weeks ago

my guess it is from changing https to http. See here Instead of directly changing https, can you try to add this code to simsiam_main.py

import ssl

ssl._create_default_https_context = ssl._create_stdlib_context

I remove and then clone the git again to make sure all the 'http' become back to 'https'. After that I add this code into simsiam_main.py, then the same error in issue8 occurs again, then I try to add it to all the .py file which use 'https' but they still don't work with the same error 😭

urllib.error.URLError: <urlopen error [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1149)>

can you try adding this line instead of the one above?

import ssl 

ssl._create_default_https_context = ssl._create_unverified_context
abnerabner commented 3 weeks ago

my guess it is from changing https to http. See here Instead of directly changing https, can you try to add this code to simsiam_main.py

import ssl

ssl._create_default_https_context = ssl._create_stdlib_context

I remove and then clone the git again to make sure all the 'http' become back to 'https'. After that I add this code into simsiam_main.py, then the same error in issue8 occurs again, then I try to add it to all the .py file which use 'https' but they still don't work with the same error 😭 urllib.error.URLError: <urlopen error [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1149)>

can you try adding this line instead of the one above?

import ssl 

ssl._create_default_https_context = ssl._create_unverified_context

still the same error

huangqinwen commented 3 weeks ago

weird I'll work on an alternative solution to download pretrained weights today and will let you know when the code is updated!

abnerabner commented 3 weeks ago

weird I'll work on an alternative solution to download pretrained weights today and will let you know when the code is updated!

Thank you :smile:

huangqinwen commented 3 weeks ago

what's your pickle version btw?

abnerabner commented 2 weeks ago

what's your pickle version btw?

it' s 4.0

huangqinwen commented 2 weeks ago

~/.cache/torch/checkpoints

There is a file resnet18-5c106cde.pth in ~/.cache/torch/hub/checkpoints

Circulating back - I tested dowloading pretrained weights using http instead of https and it worked fine on my end. For the pretrained weights you downloaded in ~/.cache/torch/hub/checkpoints, what's the size of the file? (du -sh resnet18-5c106cde.pth), the size should be 68MB.

abnerabner commented 2 weeks ago

~/.cache/torch/checkpoints

There is a file resnet18-5c106cde.pth in ~/.cache/torch/hub/checkpoints

Circulating back - I tested dowloading pretrained weights using http instead of https and it worked fine on my end. For the pretrained weights you downloaded in ~/.cache/torch/hub/checkpoints, what's the size of the file? (du -sh resnet18-5c106cde.pth), the size should be 68MB.

only 8.0K

huangqinwen commented 2 weeks ago

~/.cache/torch/checkpoints

There is a file resnet18-5c106cde.pth in ~/.cache/torch/hub/checkpoints

Circulating back - I tested dowloading pretrained weights using http instead of https and it worked fine on my end. For the pretrained weights you downloaded in ~/.cache/torch/hub/checkpoints, what's the size of the file? (du -sh resnet18-5c106cde.pth), the size should be 68MB.

only 8.0K

seems like the model is not fully downloaded which caused the pickle issue. We will update the repo soon and hopefully the new download link will solve the problem!

huangqinwen commented 2 weeks ago

I'm waiting the for updated code to the main repo. Meanwhile, can you try the following?

  1. install git lfg: https://github.com/git-lfs/git-lfs/blob/main/INSTALLING.md
  2. for the model_urls in simsiam_model_2d3d.py, replace them with the following urls:
    model_urls = {
    "resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth",
    "resnet34": "http://download.pytorch.org/models/resnet34-b627a593.pth",
    "resnet50": "http://download.pytorch.org/models/resnet50-0676ba61.pth",
    "resnet101": "http://download.pytorch.org/models/resnet101-63fe2227.pth",
    "resnet152": "http://download.pytorch.org/models/resnet152-394f9c45.pth",
    'resnet8': 'http://github.com/tbepler/topaz/blob/master/topaz/pretrained/detector/resnet8_u64.sav'
    }
  3. rerun and see if the same error still appears
abnerabner commented 2 weeks ago

The same error still appears with the file is also 8.0K

huangqinwen commented 2 weeks ago

The same error still appears with the file is also 8.0K

even after installing lfg?

huangqinwen commented 2 weeks ago

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

abnerabner commented 2 weeks ago

The same error still appears with the file is also 8.0K

even after installing lfg?

yes, git-lfs/3.5.1 (GitHub; linux amd64; go 1.21.8)

abnerabner commented 2 weeks ago

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

huangqinwen commented 2 weeks ago

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

seems like when you switched to http it is download a webpage...hmmm

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

image

i tried this on my local mac and this is what I'm getting...also realized there's a typo in the command i gave you..are you running things local or on a remote cluster?

abnerabner commented 2 weeks ago

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

seems like when you switched to http it is download a webpage...hmmm

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

image i tried this on my local mac and this is what I'm getting...also realized there's a typo in the command i gave you..are you running things local or on a remote cluster?

locally, with Linux version 6.8.0-48-generic (buildd@lcy02-amd64-040) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04)

abnerabner commented 2 weeks ago

I noticed that typo and tried the same code like you, but it just 4 k

zhenglin@guilab:/data1/zhenglin/test/template_picking_test/pytom_script/processed/chromatin$ wget http://download.pytorch.org/models/resnet18-f37072fd.pth
--2024-11-08 15:49:38--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 18.65.3.71, 18.65.3.37, 18.65.3.38, ...
Connecting to download.pytorch.org (download.pytorch.org)|18.65.3.71|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 809 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     809  --.-KB/s    in 0s

2024-11-08 15:49:38 (169 MB/s) - ‘resnet18-f37072fd.pth’ saved [809/809]

Maybe something wrong in my local server. I' m going to check it, thank you for your help!

huangqinwen commented 2 weeks ago

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

seems like when you switched to http it is download a webpage...hmmm

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

image i tried this on my local mac and this is what I'm getting...also realized there's a typo in the command i gave you..are you running things local or on a remote cluster?

locally, with Linux version 6.8.0-48-generic (buildd@lcy02-amd64-040) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04)

weird... so with wget http://download.pytorch.org/models/resnet18-f37072fd.pth it is downloading a html instead of an actual model? wget should work most of the time. The right download should show something like..

HTTP request sent, awaiting response... 200 OK
Length: 46830571 (45M) [application/x-www-form-urlencoded]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth      100%[=====================================>]  44.66M  33.8MB/s    in 1.3s

2024-11-07 23:36:46 (33.8 MB/s) - ‘resnet18-f37072fd.pth’ saved [46830571/46830571]

is it helpful if I email you the model weights if you can't download it?

abnerabner commented 2 weeks ago

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

seems like when you switched to http it is download a webpage...hmmm

if you do wget resnet18": "http://download.pytorch.org/models/resnet18-f37072fd.pth does it download the full model?

no, but it even become 4.0K

--2024-11-08 11:37:02--  ftp://resnet18/
           => ‘.listing’
Resolving resnet18 (resnet18)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘resnet18’
--2024-11-08 11:37:02--  http://download.pytorch.org/models/resnet18-f37072fd.pth
Resolving download.pytorch.org (download.pytorch.org)... 143.204.215.83, 143.204.215.87, 143.204.215.66, ...
Connecting to download.pytorch.org (download.pytorch.org)|143.204.215.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 813 [text/html]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth                                                   100%[=============================================================================================================================================================================>]     813  --.-KB/s    in 0s

2024-11-08 11:37:02 (166 MB/s) - ‘resnet18-f37072fd.pth’ saved [813/813]

4.0K ./resnet18-f37072fd.pth

image i tried this on my local mac and this is what I'm getting...also realized there's a typo in the command i gave you..are you running things local or on a remote cluster?

locally, with Linux version 6.8.0-48-generic (buildd@lcy02-amd64-040) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04)

weird... so with wget http://download.pytorch.org/models/resnet18-f37072fd.pth it is downloading a html instead of an actual model? wget should work most of the time. The right download should show something like..

HTTP request sent, awaiting response... 200 OK
Length: 46830571 (45M) [application/x-www-form-urlencoded]
Saving to: ‘resnet18-f37072fd.pth’

resnet18-f37072fd.pth      100%[=====================================>]  44.66M  33.8MB/s    in 1.3s

2024-11-07 23:36:46 (33.8 MB/s) - ‘resnet18-f37072fd.pth’ saved [46830571/46830571]

is it helpful if I email you the model weights if you can't download it?

actually I can download them from web directly then transfer them to the server, and I tried that before but these links have been used so many times that I don't know how to modify your code to make it work properly 😂

huangqinwen commented 2 weeks ago

thanks! If you can download them directly, can you try this command? (we added a new arguments to load pretrained model from user input instead of downloading from pytorch website)

python simsiam_main.py simsiam2d3d --num_epochs 20 --exp_id test_sample --bbox 36 --dataset simsiam2d3d --arch simsiam2d3d_18 --lr 1e-3 --train_img_txt sample_train_explore_img.txt --batch_size 256 --val_intervals 20 --save_all --gauss 0.8 --dog 3,5 --pretrained_model path_to_pretrained_model 
abnerabner commented 2 weeks ago

thanks! If you can download them directly, can you try this command? (we added a new arguments to load pretrained model from user input instead of downloading from pytorch website)

python simsiam_main.py simsiam2d3d --num_epochs 20 --exp_id test_sample --bbox 36 --dataset simsiam2d3d --arch simsiam2d3d_18 --lr 1e-3 --train_img_txt sample_train_explore_img.txt --batch_size 256 --val_intervals 20 --save_all --gauss 0.8 --dog 3,5 --pretrained_model path_to_pretrained_model 

It looks good now, thanks so much!