Cannot reproduce the results

zsureuk commented 3 years ago

Hi zyxxmu First of all, thanks for sharing the code of such an amazing work. However, I tried seveal times by following the gudiances of /scripts/resnet56.sh and resnet110.sh on CIFAR10 to reprodcue the results of them but failed. My results of resnet 56 is up to 91.18% which is far lower the results reported in the paper. What's worse, many experiments can only get 10% accuracy both on pytorch 1.3.1 and 1.5.1. So do you know what is the problem of it? Could you provide all the parameters required for the experiment? I would be very grateful if you could reply to me. Kind Regards

The part results of resnet56 are shown as follows: (1) 07/09 03:56:07 PM | Epoch[299] (4864/50000): Loss 2.3026 Accuracy 10.12% Time 1.75s 07/09 03:56:09 PM | Epoch[299] (9728/50000): Loss 2.3026 Accuracy 9.84% Time 1.50s 07/09 03:56:10 PM | Epoch[299] (14592/50000): Loss 2.3026 Accuracy 9.83% Time 1.49s 07/09 03:56:11 PM | Epoch[299] (19456/50000): Loss 2.3026 Accuracy 9.91% Time 1.46s 07/09 03:56:13 PM | Epoch[299] (24320/50000): Loss 2.3026 Accuracy 9.86% Time 1.43s 07/09 03:56:14 PM | Epoch[299] (29184/50000): Loss 2.3026 Accuracy 9.75% Time 1.45s 07/09 03:56:16 PM | Epoch[299] (34048/50000): Loss 2.3026 Accuracy 9.76% Time 1.49s 07/09 03:56:17 PM | Epoch[299] (38912/50000): Loss 2.3026 Accuracy 9.78% Time 1.54s 07/09 03:56:19 PM | Epoch[299] (43776/50000): Loss 2.3026 Accuracy 9.72% Time 1.54s 07/09 03:56:20 PM | Epoch[299] (48640/50000): Loss 2.3026 Accuracy 9.70% Time 1.50s 07/09 03:56:22 PM | Test Loss 2.3026 Accuracy 10.00% Time 1.10s

07/09 03:56:22 PM | Pruned Model Accuracy: 10.000 07/09 03:56:22 PM | --------------UnPrune Model-------------- 07/09 03:56:22 PM | Channels: 2032 07/09 03:56:22 PM | Params: 0.85 M 07/09 03:56:22 PM | FLOPS: 126.55 M 07/09 03:56:22 PM | --------------Prune Model-------------- 07/09 03:56:22 PM | Channels:1585 07/09 03:56:22 PM | Params: 0.56 M 07/09 03:56:22 PM | FLOPS: 55.58 M 07/09 03:56:22 PM | --------------Compress Rate-------------- 07/09 03:56:22 PM | Channels Prune Rate: 1585/2032 (22.00%) 07/09 03:56:22 PM | Params Compress Rate: 0.56 M/0.85 M(34.53%) 07/09 03:56:22 PM | FLOPS Compress Rate: 55.58 M/126.55 M(56.08%) 07/09 03:56:22 PM | --------------Layer Configuration-------------- 07/09 03:56:22 PM | [[4, 4, 4, 4, 5, 4, 4, 4, 4, 9, 9, 9, 9, 9, 9, 9, 9, 9, 42, 54, 55, 58, 59, 51, 38, 61, 25], [16, 32, 64]] (2) 07/09 03:53:35 PM | Epoch[299] (4864/50000): Loss 2.3026 Accuracy 10.37% Time 1.72s 07/09 03:53:36 PM | Epoch[299] (9728/50000): Loss 2.3026 Accuracy 10.32% Time 1.51s 07/09 03:53:38 PM | Epoch[299] (14592/50000): Loss 2.3026 Accuracy 10.15% Time 1.52s 07/09 03:53:40 PM | Epoch[299] (19456/50000): Loss 2.3026 Accuracy 10.04% Time 1.53s 07/09 03:53:41 PM | Epoch[299] (24320/50000): Loss 2.3026 Accuracy 10.03% Time 1.51s 07/09 03:53:43 PM | Epoch[299] (29184/50000): Loss 2.3026 Accuracy 10.06% Time 1.48s 07/09 03:53:44 PM | Epoch[299] (34048/50000): Loss 2.3026 Accuracy 10.00% Time 1.51s 07/09 03:53:46 PM | Epoch[299] (38912/50000): Loss 2.3026 Accuracy 9.99% Time 1.52s 07/09 03:53:47 PM | Epoch[299] (43776/50000): Loss 2.3026 Accuracy 10.01% Time 1.46s 07/09 03:53:48 PM | Epoch[299] (48640/50000): Loss 2.3026 Accuracy 9.94% Time 1.42s 07/09 03:53:50 PM | Test Loss 2.3026 Accuracy 10.00% Time 1.15s

07/09 03:53:50 PM | Pruned Model Accuracy: 10.000 07/09 03:53:50 PM | --------------UnPrune Model-------------- 07/09 03:53:50 PM | Channels: 2032 07/09 03:53:50 PM | Params: 0.85 M 07/09 03:53:50 PM | FLOPS: 126.55 M 07/09 03:53:50 PM | --------------Prune Model-------------- 07/09 03:53:50 PM | Channels:1596 07/09 03:53:50 PM | Params: 0.57 M 07/09 03:53:50 PM | FLOPS: 56.90 M 07/09 03:53:50 PM | --------------Compress Rate-------------- 07/09 03:53:50 PM | Channels Prune Rate: 1596/2032 (21.46%) 07/09 03:53:50 PM | Params Compress Rate: 0.57 M/0.85 M(33.54%) 07/09 03:53:50 PM | FLOPS Compress Rate: 56.90 M/126.55 M(55.04%) 07/09 03:53:50 PM | --------------Layer Configuration-------------- 07/09 03:53:50 PM | [[5, 6, 4, 4, 4, 4, 4, 4, 4, 9, 9, 9, 9, 9, 9, 9, 11, 9, 47, 42, 52, 47, 55, 60, 49, 50, 48], [16, 32, 64]] (3) 07/09 12:44:46 PM | Epoch[299] (4864/50000): Loss 0.0241 Accuracy 99.38% Time 1.78s 07/09 12:44:48 PM | Epoch[299] (9728/50000): Loss 0.0254 Accuracy 99.30% Time 1.47s 07/09 12:44:49 PM | Epoch[299] (14592/50000): Loss 0.0253 Accuracy 99.30% Time 1.41s 07/09 12:44:51 PM | Epoch[299] (19456/50000): Loss 0.0252 Accuracy 99.31% Time 1.44s 07/09 12:44:52 PM | Epoch[299] (24320/50000): Loss 0.0249 Accuracy 99.32% Time 1.43s 07/09 12:44:54 PM | Epoch[299] (29184/50000): Loss 0.0246 Accuracy 99.34% Time 1.42s 07/09 12:44:55 PM | Epoch[299] (34048/50000): Loss 0.0245 Accuracy 99.32% Time 1.44s 07/09 12:44:57 PM | Epoch[299] (38912/50000): Loss 0.0250 Accuracy 99.30% Time 1.42s 07/09 12:44:58 PM | Epoch[299] (43776/50000): Loss 0.0252 Accuracy 99.29% Time 1.46s 07/09 12:44:59 PM | Epoch[299] (48640/50000): Loss 0.0249 Accuracy 99.31% Time 1.49s 07/09 12:45:01 PM | Test Loss 0.4011 Accuracy 90.79% Time 1.17s

07/09 12:45:01 PM | Pruned Model Accuracy: 91.180 07/09 12:45:01 PM | --------------UnPrune Model-------------- 07/09 12:45:01 PM | Channels: 2032 07/09 12:45:01 PM | Params: 0.85 M 07/09 12:45:01 PM | FLOPS: 126.55 M 07/09 12:45:01 PM | --------------Prune Model-------------- 07/09 12:45:01 PM | Channels:1591 07/09 12:45:01 PM | Params: 0.56 M 07/09 12:45:01 PM | FLOPS: 56.69 M 07/09 12:45:01 PM | --------------Compress Rate-------------- 07/09 12:45:01 PM | Channels Prune Rate: 1591/2032 (21.70%) 07/09 12:45:01 PM | Params Compress Rate: 0.56 M/0.85 M(34.93%) 07/09 12:45:01 PM | FLOPS Compress Rate: 56.69 M/126.55 M(55.20%) 07/09 12:45:01 PM | --------------Layer Configuration-------------- 07/09 12:45:01 PM | [[4, 4, 4, 4, 4, 4, 5, 4, 4, 9, 9, 11, 12, 14, 9, 10, 10, 9, 54, 50, 50, 39, 60, 54, 50, 30, 50], [16, 32, 64]]

zyxxmu commented 3 years ago

Thanks for your interesting in our work!

Please adjust the parameter --sparse_lambda to 0 for ResNet56/110.

My collaborators implemented that and it can reproduce the test accuracy.

zsureuk commented 3 years ago

Thanks for your interesting in our work!

Please adjust the parameter --sparse_lambda to 0 for ResNet56/110.

My collaborators implemented that and it can reproduce the test accuracy.

Thanks for your reply. I tried the new setting as you said. The acc of experiments is higher than before but they are still far lower than the results in the paper.

This is part result of experiment 07/14 11:30:38 PM | Epoch[298] (43776/50000): Loss 0.0105 Accuracy 99.75% Time 1.44s 07/14 11:30:40 PM | Epoch[298] (48640/50000): Loss 0.0105 Accuracy 99.76% Time 1.48s 07/14 11:30:42 PM | Test Loss 0.3953 Accuracy 91.49% Time 1.23s

07/14 11:30:43 PM | Epoch[299] (4864/50000): Loss 0.0106 Accuracy 99.79% Time 1.68s 07/14 11:30:45 PM | Epoch[299] (9728/50000): Loss 0.0104 Accuracy 99.76% Time 1.41s 07/14 11:30:46 PM | Epoch[299] (14592/50000): Loss 0.0100 Accuracy 99.78% Time 1.43s 07/14 11:30:47 PM | Epoch[299] (19456/50000): Loss 0.0097 Accuracy 99.78% Time 1.43s 07/14 11:30:49 PM | Epoch[299] (24320/50000): Loss 0.0100 Accuracy 99.76% Time 1.44s 07/14 11:30:50 PM | Epoch[299] (29184/50000): Loss 0.0100 Accuracy 99.76% Time 1.45s 07/14 11:30:52 PM | Epoch[299] (34048/50000): Loss 0.0102 Accuracy 99.76% Time 1.42s 07/14 11:30:53 PM | Epoch[299] (38912/50000): Loss 0.0103 Accuracy 99.74% Time 1.48s 07/14 11:30:55 PM | Epoch[299] (43776/50000): Loss 0.0102 Accuracy 99.74% Time 1.48s 07/14 11:30:56 PM | Epoch[299] (48640/50000): Loss 0.0100 Accuracy 99.75% Time 1.47s 07/14 11:30:58 PM | Test Loss 0.3928 Accuracy 91.59% Time 1.27s

07/14 11:30:58 PM | Pruned Model Accuracy: 91.600 07/14 11:30:58 PM | --------------UnPrune Model-------------- 07/14 11:30:58 PM | Channels: 2032 07/14 11:30:58 PM | Params: 0.85 M 07/14 11:30:58 PM | FLOPS: 126.55 M 07/14 11:30:58 PM | --------------Prune Model-------------- 07/14 11:30:58 PM | Channels:1503 07/14 11:30:58 PM | Params: 0.43 M 07/14 11:30:58 PM | FLOPS: 55.69 M 07/14 11:30:58 PM | --------------Compress Rate-------------- 07/14 11:30:58 PM | Channels Prune Rate: 1503/2032 (26.03%) 07/14 11:30:58 PM | Params Compress Rate: 0.43 M/0.85 M(49.81%) 07/14 11:30:58 PM | FLOPS Compress Rate: 55.69 M/126.55 M(56.00%) 07/14 11:30:58 PM | --------------Layer Configuration-------------- 07/14 11:30:58 PM | [[6, 4, 4, 6, 8, 6, 4, 7, 4, 9, 14, 13, 13, 12, 20, 18, 14, 14, 38, 41, 40, 44, 38, 31, 25, 22, 24], [16, 32, 64]]

The param settings are shown as below: parser.add_argument('--gpus', type=int, nargs='+', default=[0], help='Select gpu_id to use. default:[0]',) parser.add_argument('--dataset', type=str, default='cifar10', help='Select dataset to train. default:cifar10',) parser.add_argument('--data_path', type=str, default='./data', help='The dictionary where the input is stored. default:/data/cifar10/',) parser.add_argument('--job_dir', type=str, default='experiments/', help='The directory where the summaries will be stored. default:./experiments') parser.add_argument('--resume', action='store_true', help='Load the model from the specified checkpoint.')

Training

parser.add_argument('--arch', type=str, default='resnet_cifar', help='Architecture of model. default:resnet') parser.add_argument('--cfg', type=str, default='resnet56', help='Detail architecuture of model. default:resnet56') parser.add_argument('--num_epochs', type=int, default=300, help='The number of epoch to train. default:300') parser.add_argument('--train_batch_size', type=int, default=256, help='Batch size for training. default:256') parser.add_argument('--eval_batch_size', type=int, default=100, help='Batch size for validation. default:100') parser.add_argument('--momentum', type=float, default=0.9, help='Momentum for MomentumOptimizer. default:0.9') parser.add_argument('--lr', type=float, default=0.1, help='Learning rate for train. default:1e-2') parser.add_argument('--lr_type', default='step', type=str, help='lr scheduler (step/exp/cos/step3/fixed)') parser.add_argument('--criterion', default='Softmax', type=str, help='Loss func (Softmax)') parser.add_argument('--lr_decay_step', type=int, nargs='+', default=[150, 225], help='the iterval of learn rate. default:50, 100') parser.add_argument('--weight_decay', type=float, default=0.0005, help='The weight decay of loss. default:5e-3') parser.add_argument('--pruning_rate', type=float, default=0.55, help='Target Pruning Rate. default:0.5') parser.add_argument('--classtrain_epochs', type=int, default=30, help='Train_class_epochs') parser.add_argument('--sparse_lambda', type=float, default=0, help='Sparse_lambda. default:0.00001') parser.add_argument('--min_preserve', type=float, default=0.3, help='Minimum preserve percentage of each layer. default:0.3') parser.add_argument('--debug', action='store_true', help='input to open debug state')

Are they same as the setup of your experiment? I would be very grateful if you could reply to me. Kind Regards

zyxxmu commented 3 years ago

My setup: 2020-09-09-18:23:22

gpus: [2] dataset: cifar10 data_path: /home/Datasets/Cifar job_dir: ./experiment/cifar/resnet/1 resume: None arch: resnet_cifar cfg: resnet56 num_epochs: 300 train_batch_size: 256 eval_batch_size: 100 momentum: 0.9 lr: 0.1 lr_type: step label_smooth: 0.1 criterion: Softmax lr_decay_step: [150, 225] weight_decay: 0.0005 pretrain_model: None classtrain_epochs: 30 sparse_lambda: 0.0 freeze: False min_preserve: 0.3 pruning_rate: 0.55 init_method: direct_project

The training log and pruned model are at: https://drive.google.com/drive/folders/1NSnJnLGWsSJLiVCksk1OnOK2iVGRfLyg?usp=sharing

Ps, I think the performance degradation is mainly from different pruned structure, as can be seen from the logger, my pruned structure is: 09/09 07:10:21 PM | --------------Layer Configuration-------------- 09/09 07:10:21 PM | [[4, 4, 4, 4, 4, 4, 7, 5, 6, 14, 16, 15, 16, 12, 11, 9, 16, 17, 32, 44, 40, 43, 45, 41, 35, 31, 30], [16, 32, 64]]

Good luck and best wishes!

zyxxmu commented 3 years ago

My setup: 2020-09-09-18:23:22

gpus: [2] dataset: cifar10 data_path: /home/Datasets/Cifar job_dir: ./experiment/cifar/resnet/1 resume: None arch: resnet_cifar cfg: resnet56 num_epochs: 300 train_batch_size: 256 eval_batch_size: 100 momentum: 0.9 lr: 0.1 lr_type: step label_smooth: 0.1 criterion: Softmax lr_decay_step: [150, 225] weight_decay: 0.0005 pretrain_model: None classtrain_epochs: 30 sparse_lambda: 0.0 freeze: False min_preserve: 0.3 pruning_rate: 0.55 init_method: direct_project

The training log and pruned model are at: https://drive.google.com/drive/folders/1NSnJnLGWsSJLiVCksk1OnOK2iVGRfLyg?usp=sharing

Ps, I think the performance degradation is mainly from different pruned structure, as can be seen from the logger, my pruned structure is: 09/09 07:10:21 PM | --------------Layer Configuration-------------- 09/09 07:10:21 PM | [[4, 4, 4, 4, 4, 4, 7, 5, 6, 14, 16, 15, 16, 12, 11, 9, 16, 17, 32, 44, 40, 43, 45, 41, 35, 31, 30], [16, 32, 64]]

Good luck and best wishes!

zsureuk commented 3 years ago

Thanks for your reply.

I found some param setting which are not in options.py and I also did not find them in cifar10.py and resent_cifar.py For exemple: init_method: direct_project freeze: False label_smooth: 0.1 Could you tell me where they are used?

Kind Regards

zyxxmu commented 3 years ago

Thanks for your reply.

I found some param setting which are not in options.py and I also did not find them in cifar10.py and resent_cifar.py For exemple: init_method: direct_project freeze: False label_smooth: 0.1 Could you tell me where they are used?

Kind Regards

These params are not related to our method, which are inherited from other code frameworks. We deleted these parameters in the final code version.

zyxxmu / White-Box

Cannot reproduce the results #1

Training