lxtGH / CAE

This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"
193 stars 22 forks source link

about modeling_finetune.py #4

Closed lywang76 closed 2 years ago

lywang76 commented 2 years ago

In your method

@register_model def cae_large_patch16_384(pretrained=False, kwargs): model = VisionTransformer( img_size=384, patch_size=16, embed_dim=1024, depth=24, num_heads=16, mlp_ratio=4, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), kwargs) model.default_cfg = _cfg() return model

def _cfg(url='', kwargs): return { 'url': url, 'input_size': (3, 224, 224), 'pool_size': None, 'crop_pct': .9, 'interpolation': 'bicubic', 'mean': (0.5, 0.5, 0.5), 'std': (0.5, 0.5, 0.5), kwargs }

Therefore, if the input size is 384, your calling _cfg() will revise the input size to 224 again.

Could this be a potential problem?

SelfSup-MIM commented 2 years ago

Hi, this problem could be solved by passing this argument to the program: --input_size 384.