naver / croco

257 stars 20 forks source link

Interpolate_pos_embed in croco v1 #27

Closed Gynjn closed 1 month ago

Gynjn commented 1 month ago

It seems the provided code does not compatible with croco v1.

I change it

def interpolate_pos_embed(model, checkpoint_model):
    if 'enc_pos_embed' in checkpoint_model:
        pos_embed_checkpoint = checkpoint_model['enc_pos_embed']
        embedding_size = pos_embed_checkpoint.shape[-1]
        num_patches = model.patch_embed.num_patches
        num_extra_tokens = model.enc_pos_embed.shape[-2] - num_patches
        # height (== width) for the checkpoint position embedding
        orig_size = int((pos_embed_checkpoint.shape[-2] - num_extra_tokens) ** 0.5)
        # height (== width) for the new position embedding
        new_size = int(num_patches ** 0.5)
        # class_token and dist_token are kept unchanged
        if orig_size != new_size:
            print("Position interpolate from %dx%d to %dx%d" % (orig_size, orig_size, new_size, new_size))
            extra_tokens = pos_embed_checkpoint[:num_extra_tokens]
            # only the position tokens are interpolated
            pos_tokens = pos_embed_checkpoint[num_extra_tokens:]
            pos_tokens = pos_tokens.reshape(orig_size, orig_size, embedding_size).unsqueeze(0).permute(0, 3, 1, 2)
            pos_tokens = torch.nn.functional.interpolate(
                pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
            pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(1, 2).squeeze(0)
            new_pos_embed =, pos_tokens), dim=0)
            checkpoint_model['enc_pos_embed'] = new_pos_embed

    if 'dec_pos_embed' in checkpoint_model:
        pos_embed_checkpoint = checkpoint_model['dec_pos_embed']
        embedding_size = pos_embed_checkpoint.shape[-1]
        num_patches = model.patch_embed.num_patches
        num_extra_tokens = model.dec_pos_embed.shape[-2] - num_patches
        # height (== width) for the checkpoint position embedding
        orig_size = int((pos_embed_checkpoint.shape[-2] - num_extra_tokens) ** 0.5)
        # height (== width) for the new position embedding
        new_size = int(num_patches ** 0.5)
        # class_token and dist_token are kept unchanged
        if orig_size != new_size:
            print("Position interpolate from %dx%d to %dx%d" % (orig_size, orig_size, new_size, new_size))
            extra_tokens = pos_embed_checkpoint[:num_extra_tokens]
            # only the position tokens are interpolated
            pos_tokens = pos_embed_checkpoint[num_extra_tokens:]
            pos_tokens = pos_tokens.reshape(orig_size, orig_size, embedding_size).unsqueeze(0).permute(0, 3, 1, 2)
            pos_tokens = torch.nn.functional.interpolate(
                pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
            pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(1, 2).squeeze(0)
            new_pos_embed =, pos_tokens), dim=0)
            checkpoint_model['dec_pos_embed'] = new_pos_embed    

Is it doing the operation what you intended?

PhilippeWeinzaepfel commented 1 month ago


Thanks for noticing it. This is indeed what was intended. I have just push a commit that does the same (with a for loop over enc_pos_embed and dec_pos_embed) and also handles the case where height and width are different for a downstream task.

Best Philippe