Incompatible Model Architecture and Saved Weights Issue

I encountered an issue while attempting to utilize the provided saved model weights for the architecture implemented in the code repository. The saved model (finetuned on NIH ChestXRay) 'vit-b_CXR_0.5M_mae.pth' weights seem to be incompatible with the architecture initialization in the codebase, resulting in errors during loading and execution.

Steps to Reproduce:

Clone the repository.
Follow the provided instructions to run main_finetune_chestxray.py with saved finetuned weights : 'vit-b_CXR_0.5M_mae.pth' and initializing the model as vit_base_patch16
Attempt to load the provided saved model weights. [line 282-306 in main_finetune_chestxray.py]
Observe errors indicating an architectural mismatch.

Expected Behavior: The saved model weights should seamlessly load and align with the architecture defined in the code.

Actual Behavior: The saved model weights are not compatible with the architecture initialization, leading to runtime errors.

Identified Issue:

Cls_tokens, positional_embeddings, patch_embeddings are loaded seamlessly
ViT Encoder's block 0-11 are also loaded seamlessly
The final 4 layers have a problem here
In code, model is initialized in the sequential format as: previous_blocks -> block11 -> fc_norm.weight -> fc_norm.bias -> head.weight -> head_bias
In saved checkpoint the final layers are in the sequential format as previous_blocks -> block11 -> head.weight -> head_bias -> fc_norm.weight -> fc_norm.bias

Additional Information: The repository's documentation provides saved model weights. The architecture initialization in the code is consistent with the provided guidelines. The error message suggests a mismatch between the loaded model and the initialized architecture. The issue prevents further utilization of the model weights for desired tasks. This issue significantly impacts the ease of integrating and utilizing the repository's pre-trained models. A resolution or guidance on properly aligning the model weights with the code's architecture would be greatly appreciated.

The weights to the model are loaded. I've edited the code to load weights of the model more seamlessly.

if 'vit' in args.model:
    model = models_vit.__dict__[args.model](
        img_size=args.input_size,
        num_classes=args.nb_classes,
        drop_rate=args.vit_dropout_rate,
        drop_path_rate=args.drop_path,
        global_pool=args.global_pool,
     )

if args.finetune:
    if 'vit' in args.model:
        checkpoint = torch.load(args.finetune, map_location='cpu')

        print("Load pre-trained checkpoint from: %s" % args.finetune)
        checkpoint_model = checkpoint['model']
        state_dict = model.state_dict()

        for k in checkpoint_model.keys():
            if k in state_dict:
                if checkpoint_model[k].shape == state_dict[k].shape:
                    state_dict[k] = checkpoint_model[k]
                    print(f"Loaded Index: {k} from Saved Weights")
                else:
                    print(f"Shape of {k} doesn't match with {state_dict[k]}")
            else:
                print(f"{k} not found in Init Model")

        # interpolate position embedding
        interpolate_pos_embed(model, state_dict)

        # load pre-trained model
        model.load_state_dict(state_dict)

lambert-x / medical_mae

Incompatible Model Architecture and Saved Weights Issue #6