Can't load pnfs-mobilenet-v2_nfx3_384x288_coco_ckpt.tar for testing

YinRui1991 commented 4 years ago

Hi, @yangsenius Thanks for your excellent work. I meet some problems when I test with the mobilenetv2-coco-384x288 model (pnfs-mobilenet-v2_nfx3_384x288_coco_ckpt.tar). I have modified the yaml file to fit the model, but when I run testing code, I meet the following error:

RuntimeError: Error(s) in loading state_dict for Body_Part_Representation: Missing key(s) in state_dict: "upper_limb_part.cell_4_0.cell_arch.0.ops.3.operation.0.weight", "upper_limb_part.cell_4_0.cell_arch.0.ops.3.operation.1.weight", "upper_limb_part.cell_4_0.cell_arch.0.ops.3.operation.2.running_mean",........(...... means many layer names) size mismatch for upper_limb_part.alphas: copying a param with shape torch.Size([1, 3]) from checkpoint, the shape in current model is torch.Size([1, 4]). size mismatch for head_part.alphas: copying a param with shape torch.Size([1, 3]) from checkpoint, the shape in current model is torch.Size([1, 4]). size mismatch for lower_limb_part.alphas: copying a param with shape torch.Size([1, 3]) from checkpoint, the shape in current model is torch.Size([1, 4]).

I think this error caused by the different structure between the init network and your trained network. And I am confused why inference operation needs load backbone first (Arch = bulid_up_network(config, criterion)) ? Could we load the model directly?

And could you share your yaml file ? Sorry for asking so many questions.

yangsenius commented 4 years ago

Yes, there is a little difference between your initial net and the trained net. This error was caused by that I trained this model with operators: operators: [ "skip_connect", "Sep_Conv_3x3", "Atr_Conv_3x3"]. Maybe yours is operators: ["Zero", "skip_connect", "Sep_Conv_3x3", "Atr_Conv_3x3"], so the error occurred. Here I need to clarify one point that it is feasible to not use zero operation in our setting because unlike DARTS we do not prune operations by taking the argmax, thus zero operation is not a necessity (but we use zero operation in most cases). I'm sorry that I didn't mention it in the paper. You can update the yaml file as below:

    subnetwork_config:
        dataset_name: 'coco'
        parts_num : 3
        cell_config:
            vector_in_pixel : True
            vector_dim: 8
            convolution_mode: '2D'
            one-shot-search: True

            search_alpha: true
            search_beta: true
            operators: [ "skip_connect", "Sep_Conv_3x3", "Atr_Conv_3x3",]

            depth: 6
            cut_layers_num: 3  # first several layers
            size_types: [4,8,16,32]
            hidden_states_num: 1
            factor: 16
            input_nodes_num: 1

I use backbone module to provide various choices for different networks. You also can directly load the model by directly using the meta_arch class.

YinRui1991 commented 4 years ago

@yangsenius Thanks for your quick and detailed explanation. I have run the testing code successfully : )

yangsenius / PoseNFS

Can't load pnfs-mobilenet-v2_nfx3_384x288_coco_ckpt.tar for testing #4