Open VJeee opened 2 years ago
Hi! Sorry for the inconvenience to you. We split the checkpoint based on algorithm.pruner.deploy_subnet
(refer to here). When overwriting the original parameter with the sliced parameter of a nn.Module
, it is necessary to use the copy of the sliced parameter like module.weight = nn.Parameter(temp_weight.data.clone())
. After that, the file size of the three different checkpoint sizes will be completely different. This problem is fixed in branch dev-1.x
.
For the state dict mismatch problem, channel_cfg may be
channel_cfg = '/home/wenjie/PycharmProjects/mmrazor_demo/autoslim_test/search/subnet_13978536.yaml'
but not
channel_cfg = [
'/home/wenjie/PycharmProjects/mmrazor_demo/autoslim_test/search/subnet_13978536.yaml', # noqa: E501
'/home/wenjie/PycharmProjects/mmrazor_demo/autoslim_test/search/subnet_12989328.yaml', # noqa: E501
'/home/wenjie/PycharmProjects/mmrazor_demo/autoslim_test/search/subnet_11942370.yaml', # noqa: E501
]
if the checkpoint to be loaded is the split one corresponding to subnet_13978536
.
For the 'AttributeError: 'MMDataParallel' object has no attribute 'CLASSES'' problem, could you please provide more error information so that we can locate the bug.
Checklist
Describe the question you meet
I used the official configuration file to do the autoslim test on the cifar100 dataset. But after using split_checkpoint.py to split the retrained weight file, the size of the obtained weight file is the same. And when doing the test, it will report the error 'The model and loaded state dict do not match exactly' and 'AttributeError: 'MMDataParallel' object has no attribute 'CLASSES''.
Post related information
pip list | grep "mmcv\|mmrazor\|^torch"
mmcv-full 1.5.0 torch 1.10.0 torchsummary 1.5.1 torchvision 0.11.1 mmrazor 0.3.1mmrazor
folder. [here]