salesforce / PCL

PyTorch code for "Prototypical Contrastive Learning of Unsupervised Representations"
MIT License
570 stars 83 forks source link

Is it possible to resume training with one of the pretrained models? #10

Closed immuntasir closed 3 years ago

immuntasir commented 3 years ago

Hi,

I was trying to resume the training using the "--resume" argument and one of the PCL pre-trained models provided on the homepage. But I am getting the following error.

Can anyone please help me with this? I am trying to using the pretrained imagenet models on a different dataset, with training some extra epochs.

RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: Missing key(s) in state_dict: "module.queue", "module.queue_ptr", "module.encoder_k.conv1.weight", "module.encoder_k.bn1.weight", "module.encoder_k.bn1.bias", "module.encoder_k.bn1.running_mean", "module.encoder_k.bn1.running_var", "module.encoder_k.layer1.0.conv1.weight", "module.encoder_k.layer1.0.bn1.weight", "module.encoder_k.layer1.0.bn1.bias", "module.encoder_k.layer1.0.bn1.running_mean", "module.encoder_k.layer1.0.bn1.running_var", "module.encoder_k.layer1.0.conv2.weight", "module.encoder_k.layer1.0.bn2.weight", "module.encoder_k.layer1.0.bn2.bias", "module.encoder_k.layer1.0.bn2.running_mean", "module.encoder_k.layer1.0.bn2.running_var", "module.encoder_k.layer1.0.conv3.weight", "module.encoder_k.layer1.0.bn3.weight", "module.encoder_k.layer1.0.bn3.bias", "module.encoder_k.layer1.0.bn3.running_mean", "module.encoder_k.layer1.0.bn3.running_var", "module.encoder_k.layer1.0.downsample.0.weight", "module.encoder_k.layer1.0.downsample.1.weight", "module.encoder_k.layer1.0.downsample.1.bias", "module.encoder_k.layer1.0.downsample.1.running_mean", "module.encoder_k.layer1.0.downsample.1.running_var", "module.encoder_k.layer1.1.conv1.weight", "module.encoder_k.layer1.1.bn1.weight", "module.encoder_k.layer1.1.bn1.bias", "module.encoder_k.layer1.1.bn1.running_mean", "module.encoder_k.layer1.1.bn1.running_var", "module.encoder_k.layer1.1.conv2.weight", "module.encoder_k.layer1.1.bn2.weight", "module.encoder_k.layer1.1.bn2.bias", "module.encoder_k.layer1.1.bn2.running_mean", "module.encoder_k.layer1.1.bn2.running_var", "module.encoder_k.layer1.1.conv3.weight", "module.encoder_k.layer1.1.bn3.weight", "module.encoder_k.layer1.1.bn3.bias", "module.encoder_k.layer1.1.bn3.running_mean", "module.encoder_k.layer1.1.bn3.running_var", "module.encoder_k.layer1.2.conv1.weight", "module.encoder_k.layer1.2.bn1.weight", "module.encoder_k.layer1.2.bn1.bias", "module.encoder_k.layer1.2.bn1.running_mean", "module.encoder_k.layer1.2.bn1.running_var", "module.encoder_k.layer1.2.conv2.weight", "module.encoder_k.layer1.2.bn2.weight", "module.encoder_k.layer1.2.bn2.bias", "module.encoder_k.layer1.2.bn2.running_mean", "module.encoder_k.layer1.2.bn2.running_var", "module.encoder_k.layer1.2.conv3.weight", "module.encoder_k.layer1.2.bn3.weight", "module.encoder_k.layer1.2.bn3.bias", "module.encoder_k.layer1.2.bn3.running_mean", "module.encoder_k.layer1.2.bn3.running_var", "module.encoder_k.layer2.0.conv1.weight", "module.encoder_k.layer2.0.bn1.weight", "module.encoder_k.layer2.0.bn1.bias", "module.encoder_k.layer2.0.bn1.running_mean", "module.encoder_k.layer2.0.bn1.running_var", "module.encoder_k.layer2.0.conv2.weight", "module.encoder_k.layer2.0.bn2.weight", "module.encoder_k.layer2.0.bn2.bias", "module.encoder_k.layer2.0.bn2.running_mean", "module.encoder_k.layer2.0.bn2.running_var", "module.encoder_k.layer2.0.conv3.weight", "module.encoder_k.layer2.0.bn3.weight", "module.encoder_k.layer2.0.bn3.bias", "module.encoder_k.layer2.0.bn3.running_mean", "module.encoder_k.layer2.0.bn3.running_var", "module.encoder_k.layer2.0.downsample.0.weight", "module.encoder_k.layer2.0.downsample.1.weight", "module.encoder_k.layer2.0.downsample.1.bias", "module.encoder_k.layer2.0.downsample.1.running_mean", "module.encoder_k.layer2.0.downsample.1.running_var", "module.encoder_k.layer2.1.conv1.weight", "module.encoder_k.layer2.1.bn1.weight", "module.encoder_k.layer2.1.bn1.bias", "module.encoder_k.layer2.1.bn1.running_mean", "module.encoder_k.layer2.1.bn1.running_var", "module.encoder_k.layer2.1.conv2.weight", "module.encoder_k.layer2.1.bn2.weight", "module.encoder_k.layer2.1.bn2.bias", "module.encoder_k.layer2.1.bn2.running_mean", "module.encoder_k.layer2.1.bn2.running_var", "module.encoder_k.layer2.1.conv3.weight", "module.encoder_k.layer2.1.bn3.weight", "module.encoder_k.layer2.1.bn3.bias", "module.encoder_k.layer2.1.bn3.running_mean", "module.encoder_k.layer2.1.bn3.running_var", "module.encoder_k.layer2.2.conv1.weight", "module.encoder_k.layer2.2.bn1.weight", "module.encoder_k.layer2.2.bn1.bias", "module.encoder_k.layer2.2.bn1.running_mean", "module.encoder_k.layer2.2.bn1.running_var", "module.encoder_k.layer2.2.conv2.weight", "module.encoder_k.layer2.2.bn2.weight", "module.encoder_k.layer2.2.bn2.bias", "module.encoder_k.layer2.2.bn2.running_mean", "module.encoder_k.layer2.2.bn2.running_var", "module.encoder_k.layer2.2.conv3.weight", "module.encoder_k.layer2.2.bn3.weight", "module.encoder_k.layer2.2.bn3.bias", "module.encoder_k.layer2.2.bn3.running_mean", "module.encoder_k.layer2.2.bn3.running_var", "module.encoder_k.layer2.3.conv1.weight", "module.encoder_k.layer2.3.bn1.weight", "module.encoder_k.layer2.3.bn1.bias", "module.encoder_k.layer2.3.bn1.running_mean", "module.encoder_k.layer2.3.bn1.running_var", "module.encoder_k.layer2.3.conv2.weight", "module.encoder_k.layer2.3.bn2.weight", "module.encoder_k.layer2.3.bn2.bias", "module.encoder_k.layer2.3.bn2.running_mean", "module.encoder_k.layer2.3.bn2.running_var", "module.encoder_k.layer2.3.conv3.weight", "module.encoder_k.layer2.3.bn3.weight", "module.encoder_k.layer2.3.bn3.bias", "module.encoder_k.layer2.3.bn3.running_mean", "module.encoder_k.layer2.3.bn3.running_var", "module.encoder_k.layer3.0.conv1.weight", "module.encoder_k.layer3.0.bn1.weight", "module.encoder_k.layer3.0.bn1.bias", "module.encoder_k.layer3.0.bn1.running_mean", "module.encoder_k.layer3.0.bn1.running_var", "module.encoder_k.layer3.0.conv2.weight", "module.encoder_k.layer3.0.bn2.weight", "module.encoder_k.layer3.0.bn2.bias", "module.encoder_k.layer3.0.bn2.running_mean", "module.encoder_k.layer3.0.bn2.running_var", "module.encoder_k.layer3.0.conv3.weight", "module.encoder_k.layer3.0.bn3.weight", "module.encoder_k.layer3.0.bn3.bias", "module.encoder_k.layer3.0.bn3.running_mean", "module.encoder_k.layer3.0.bn3.running_var", "module.encoder_k.layer3.0.downsample.0.weight", "module.encoder_k.layer3.0.downsample.1.weight", "module.encoder_k.layer3.0.downsample.1.bias", "module.encoder_k.layer3.0.downsample.1.running_mean", "module.encoder_k.layer3.0.downsample.1.running_var", "module.encoder_k.layer3.1.conv1.weight", "module.encoder_k.layer3.1.bn1.weight", "module.encoder_k.layer3.1.bn1.bias", "module.encoder_k.layer3.1.bn1.running_mean", "module.encoder_k.layer3.1.bn1.running_var", "module.encoder_k.layer3.1.conv2.weight", "module.encoder_k.layer3.1.bn2.weight", "module.encoder_k.layer3.1.bn2.bias", "module.encoder_k.layer3.1.bn2.running_mean", "module.encoder_k.layer3.1.bn2.running_var", "module.encoder_k.layer3.1.conv3.weight", "module.encoder_k.layer3.1.bn3.weight", "module.encoder_k.layer3.1.bn3.bias", "module.encoder_k.layer3.1.bn3.running_mean", "module.encoder_k.layer3.1.bn3.running_var", "module.encoder_k.layer3.2.conv1.weight", "module.encoder_k.layer3.2.bn1.weight", "module.encoder_k.layer3.2.bn1.bias", "module.encoder_k.layer3.2.bn1.running_mean", "module.encoder_k.layer3.2.bn1.running_var", "module.encoder_k.layer3.2.conv2.weight", "module.encoder_k.layer3.2.bn2.weight", "module.encoder_k.layer3.2.bn2.bias", "module.encoder_k.layer3.2.bn2.running_mean", "module.encoder_k.layer3.2.bn2.running_var", "module.encoder_k.layer3.2.conv3.weight", "module.encoder_k.layer3.2.bn3.weight", "module.encoder_k.layer3.2.bn3.bias", "module.encoder_k.layer3.2.bn3.running_mean", "module.encoder_k.layer3.2.bn3.running_var", "module.encoder_k.layer3.3.conv1.weight", "module.encoder_k.layer3.3.bn1.weight", "module.encoder_k.layer3.3.bn1.bias", "module.encoder_k.layer3.3.bn1.running_mean", "module.encoder_k.layer3.3.bn1.running_var", "module.encoder_k.layer3.3.conv2.weight", "module.encoder_k.layer3.3.bn2.weight", "module.encoder_k.layer3.3.bn2.bias", "module.encoder_k.layer3.3.bn2.running_mean", "module.encoder_k.layer3.3.bn2.running_var", "module.encoder_k.layer3.3.conv3.weight", "module.encoder_k.layer3.3.bn3.weight", "module.encoder_k.layer3.3.bn3.bias", "module.encoder_k.layer3.3.bn3.running_mean", "module.encoder_k.layer3.3.bn3.running_var", "module.encoder_k.layer3.4.conv1.weight", "module.encoder_k.layer3.4.bn1.weight", "module.encoder_k.layer3.4.bn1.bias", "module.encoder_k.layer3.4.bn1.running_mean", "module.encoder_k.layer3.4.bn1.running_var", "module.encoder_k.layer3.4.conv2.weight", "module.encoder_k.layer3.4.bn2.weight", "module.encoder_k.layer3.4.bn2.bias", "module.encoder_k.layer3.4.bn2.running_mean", "module.encoder_k.layer3.4.bn2.running_var", "module.encoder_k.layer3.4.conv3.weight", "module.encoder_k.layer3.4.bn3.weight", "module.encoder_k.layer3.4.bn3.bias", "module.encoder_k.layer3.4.bn3.running_mean", "module.encoder_k.layer3.4.bn3.running_var", "module.encoder_k.layer3.5.conv1.weight", "module.encoder_k.layer3.5.bn1.weight", "module.encoder_k.layer3.5.bn1.bias", "module.encoder_k.layer3.5.bn1.running_mean", "module.encoder_k.layer3.5.bn1.running_var", "module.encoder_k.layer3.5.conv2.weight", "module.encoder_k.layer3.5.bn2.weight", "module.encoder_k.layer3.5.bn2.bias", "module.encoder_k.layer3.5.bn2.running_mean", "module.encoder_k.layer3.5.bn2.running_var", "module.encoder_k.layer3.5.conv3.weight", "module.encoder_k.layer3.5.bn3.weight", "module.encoder_k.layer3.5.bn3.bias", "module.encoder_k.layer3.5.bn3.running_mean", "module.encoder_k.layer3.5.bn3.running_var", "module.encoder_k.layer4.0.conv1.weight", "module.encoder_k.layer4.0.bn1.weight", "module.encoder_k.layer4.0.bn1.bias", "module.encoder_k.layer4.0.bn1.running_mean", "module.encoder_k.layer4.0.bn1.running_var", "module.encoder_k.layer4.0.conv2.weight", "module.encoder_k.layer4.0.bn2.weight", "module.encoder_k.layer4.0.bn2.bias", "module.encoder_k.layer4.0.bn2.running_mean", "module.encoder_k.layer4.0.bn2.running_var", "module.encoder_k.layer4.0.conv3.weight", "module.encoder_k.layer4.0.bn3.weight", "module.encoder_k.layer4.0.bn3.bias", "module.encoder_k.layer4.0.bn3.running_mean", "module.encoder_k.layer4.0.bn3.running_var", "module.encoder_k.layer4.0.downsample.0.weight", "module.encoder_k.layer4.0.downsample.1.weight", "module.encoder_k.layer4.0.downsample.1.bias", "module.encoder_k.layer4.0.downsample.1.running_mean", "module.encoder_k.layer4.0.downsample.1.running_var", "module.encoder_k.layer4.1.conv1.weight", "module.encoder_k.layer4.1.bn1.weight", "module.encoder_k.layer4.1.bn1.bias", "module.encoder_k.layer4.1.bn1.running_mean", "module.encoder_k.layer4.1.bn1.running_var", "module.encoder_k.layer4.1.conv2.weight", "module.encoder_k.layer4.1.bn2.weight", "module.encoder_k.layer4.1.bn2.bias", "module.encoder_k.layer4.1.bn2.running_mean", "module.encoder_k.layer4.1.bn2.running_var", "module.encoder_k.layer4.1.conv3.weight", "module.encoder_k.layer4.1.bn3.weight", "module.encoder_k.layer4.1.bn3.bias", "module.encoder_k.layer4.1.bn3.running_mean", "module.encoder_k.layer4.1.bn3.running_var", "module.encoder_k.layer4.2.conv1.weight", "module.encoder_k.layer4.2.bn1.weight", "module.encoder_k.layer4.2.bn1.bias", "module.encoder_k.layer4.2.bn1.running_mean", "module.encoder_k.layer4.2.bn1.running_var", "module.encoder_k.layer4.2.conv2.weight", "module.encoder_k.layer4.2.bn2.weight", "module.encoder_k.layer4.2.bn2.bias", "module.encoder_k.layer4.2.bn2.running_mean", "module.encoder_k.layer4.2.bn2.running_var", "module.encoder_k.layer4.2.conv3.weight", "module.encoder_k.layer4.2.bn3.weight", "module.encoder_k.layer4.2.bn3.bias", "module.encoder_k.layer4.2.bn3.running_mean", "module.encoder_k.layer4.2.bn3.running_var", "module.encoder_k.fc.0.weight", "module.encoder_k.fc.0.bias", "module.encoder_k.fc.2.weight", "module.encoder_k.fc.2.bias".

immuntasir commented 3 years ago

Note: I am also getting this error with a checkpoint saved from training that I have run on the same cluster.

LiJunnan1992 commented 3 years ago

Hi, the model provided by us does not store parameters for encode_k and the queue, you would need to change PCL/builder.py such that it initializes encoder_k and the queue after loading the state_dict.

However, this error should not occur if you have a checkpoint that is saved using our script. Thanks.