keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.42k stars 82 forks source link

how to convert sparse model to dense model? #31

Closed Mike575 closed 1 year ago

Mike575 commented 1 year ago

After finishing pretrain resnet50, we can get a resnet50 weight file in sparse type. I'd like to use resnet50 (dense type) as my backbone in my other projects. But how to convert sparse model to dense model? Is there any convenient function like SparseEncoder.dense_model_to_sparse in encoder.py ?

keyu-tian commented 1 year ago

Actually we don't need a sparse-to-dense conversion (there is no "sparse"-type weights). You can use the pretrained weight directly like the following code (just like an ImageNet-supervised pretrained resnet50):

import torch, timm
res50 = timm.create_model('resnet50')
state = torch.load('resnet50_1kpretrained_timm_style.pth', 'cpu')
missing_keys, unexpected_keys = res50.load_state_dict(state, strict=False)
assert missing_keys == ['fc.weight', 'fc.bias']
assert len(unexpected_keys) == 0

The reason is we use torch builtin operators (nn.Conv2d, nn.LayerNorm, etc.) to simulate those sparse operators (see /pretrain/encoder.py). So it keeps the same weight formats and names.

puyiwen commented 1 month ago

@keyu-tian Sorry to bother you. From your way of loading the model, does it mean that SparK only uses MAE during training, does not change the original model structure, such as 'resnet', on inference? Looking forward to your reply, thank you!