meijieru / AtomNAS

[ICLR 2020]: 'AtomNAS: Fine-Grained End-to-End Neural Architecture Search'
Other
224 stars 21 forks source link

Test AtomNas-a occur missing keys and size mismatch #9

Closed andyhahaha closed 4 years ago

andyhahaha commented 4 years ago

I use this command to test AtomNas-a. FILE=$(realpath pretrained/atomnas-a) CHECKPOINT=ckpt ATOMNAS_VAL=True bash scripts/run.sh apps/eval/eval_shrink.yml

Missing keys and size mismatch occur.

RuntimeError: Error(s) in loading state_dict for AllReduceDistributedDataParallel:
        Missing key(s) in state_dict: "module.features.5.ops.2.0.0.weight", "module.features.5.ops.2.0.1.weight", "module.features.5.ops.2.0.1.bias", "module.features.5.ops.2.0.1.running_mean", "module.features.5.ops.2.0.1.running_var", "module.features.5.ops.2.1.0.weight", "module.features.5.ops.2.1.1.weight", "module.features.5.ops.2.1.1.bias", "module.features.5.ops.2.1.1.running_mean", "module.features.5.ops.2.1.1.running_var", "module.features.5.ops.2.2.weight".
        size mismatch for module.features.4.ops.0.0.0.weight: copying a param with shape torch.Size([9, 24, 1, 1]) from checkpoint, the shape in current model is torch.Size([10, 24, 1, 1]).
        size mismatch for module.features.4.ops.0.0.1.weight: copying a param with shape torch.Size([9]) from checkpoint, the shape in current model is torch.Size([10]).
        size mismatch for module.features.4.ops.0.0.1.bias: copying a param with shape torch.Size([9]) from checkpoint, the shape in current model is torch.Size([10]).
        size mismatch for module.features.4.ops.0.0.1.running_mean: copying a param with shape torch.Size([9]) from checkpoint,
the shape in current model is torch.Size([10]).
...

I can test AtomNas-b and AtomNas-c correctly, so I think the code is right. Is there something wrong with AtomNas-a ckpt?

meijieru commented 4 years ago

We have updated the corresponding config file. Sorry for the inconvenience.

zhangyuan1994511 commented 4 years ago

@meijieru Hi, Author!I'm sorry to trouble you. My training process is interrupted and resume the training process, but I get Missing keys errors. How to solve it? By the way, Do you meet the 'AssertionError: InvertedResidual.......' at 'prune.py' line 275 during the training process? I have been interrupted by this error, It's seem a bug, please check it.

meijieru commented 4 years ago

@meijieru Hi, Author!I'm sorry to trouble you. My training process is interrupted and resume the training process, but I get Missing keys errors. How to solve it? By the way, Do you meet the 'AssertionError: InvertedResidual.......' at 'prune.py' line 275 during the training process? I have been interrupted by this error, It's seem a bug, please check it.

Please provide more context and this seems to be another issue.

meijieru commented 4 years ago

@zhangyuan1994511 Please open a new issue and have a clean format, thanks.

zhangyuan1994511 commented 4 years ago

OK, I will close it and open a new issue.