Difference in weights loading

AKASH2907 commented 3 years ago

I have no problems in code just have some doubts about weight loading

Why is there a difference in weight loading in train_predict.py and ft_classfy.py?
Did you use nn.Dataparallel even with 1 GPU to add a "module" in the starting of parameters and save it like that to maintain consistency of each checkpoint?
If we use 2 networks, one successive to the other, the previous network weights are saved as module.base_network?
Does it mean we modify the base network (c3d, r21d, r3d) parameters according to the pretext task as in whole? and then use only those weights for finetuning on the downstream task dataset?
In ft_classfy load_pretrained _weights function why did you modify it to module.base_network and +14:. I mean since the strict is False, so it will upload only those weights which are common to both the networks? I mean only base network? What's the problem in this?

def load_pretrained_weights(ckpt_path):
    """load pretrained weights and adjust params name."""
    adjusted_weights = {}
    pretrained_weights = torch.load(ckpt_path)
    for name, params in pretrained_weights.items():
        if 'base_network' in name:
            name = name[name.find('.')+1:]
            adjusted_weights[name] = params
    return adjusted_weights

Thanks.

yuanyao366 commented 3 years ago

For Q1 and Q5: Because tasks are different in pretraining (train_predict.py) and finetuing (ft_classfy.py). For finetuing, we should remove the head of pretext tasks (speed prediction and frame reconstruction) and remain the backbone of c3d / r21d / r3d to initialize the model. The base network means the backbone of c3d / r21d / r3d. The length of string "base_network." is 13.

For Q2: Yes, you can also use multi-gpus when training. Just use "--gpu 0,1,2,3,..."

For Q3,4: It seems so. We only use the the backbone ("base_network") of c3d / r21d / r3d to initialize the model for finetuing.

AKASH2907 commented 3 years ago

Is the above one true for finetuning, I mean if only we search for base_network in parameters name and then do the fine-tuning and strict =False. Or that would be an incorrect approach? (like checkpoint weights would not be loaded)

yuanyao366 commented 3 years ago

For finetuing, apart from remaining the backbone ("base_network") of c3d / r21d / r3d, we also should add the head ("fc layer") of action classification task. So considering the added fc layer, we should set the strict=False when using the weight of pretrained model to initialize the finetuned model. If you set strict=Ture, you can find only the parameters of the added fc layer ("linear.weight", "linear.bias") missing.

AKASH2907 commented 3 years ago

I printed the name of model parameter keys in ft_classfy.py. So, in that, if you use the only base_network, you can find the checkpoint keys whose weights are updated. But you modify the name (+14:) so that the new keys whose weights would be used for testing has strict =True so that code doesn't show errors for last linea layers: linear.weight, linear.bias?

yuanyao366 / PRP

Difference in weights loading #5