huawei-noah / Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab
1.2k stars 210 forks source link

[RPG] Questions about Hyperparameters for CIFAR-10 #139

Closed Acasia closed 4 months ago

Acasia commented 5 months ago

Thank you for your research on "Towards Higher Ranks via Adversarial Weight Pruning".

I am currently re-implementing the ResNet32 + CIFAR-10 results but am not achieving the accuracy reported in your table, particularly at the 99.5% and 99.9% sparsity levels.

Could you please provide the hyperparameters for the following: {wd, alpha, delta, lamb, partial_k, iterative_T_end_percent, T_end_percent}?

I would also be grateful if you could provide the hyperparameters for VGG-19 on CIFAR-10.

YuchuanTian commented 5 months ago

Thanks for your comment. Here are the statistics: wd=0.0005 alpha=0.3 delta=100 lr=0.1 batchsize=128 optim=sgd epoch=300 iterative_T_end_percent=0.8 T_end_percent=0.8 lamb=0.5 partial_k=0.1 Hope that helps! p.s. The resnet network has doubled channel# of the original resnet, following previous work.

Acasia commented 5 months ago

@YuchuanTian
Thank you for your reply.

I implemented the CIFAR-10 dataset and ResNet models based on the ProbMask repository. However, even after using the parameters you sent me, the results still show low accuracy. Is there something I'm missing? Below are the test accuracies at different sparsity levels and the hyperparameters I used.

99% Epoch:299, Test_accu:[91.64,99.76] LR:0.0 99.5% Epoch:299, Test_accu:[89.16,99.66] LR:0.0 99.9% Epoch:299, Test_accu:[70.59,97.87] LR:0.0

T_end_epochs:None T_end_percent:0.8 alpha:0.3 amp:False arch:resnet32 batch_size:128 bn_weight_decay:False bounded:False checkpoint_dir:./results/resnet32cifar10magnitude-exponential_0.999_300/conf0.80.1_128_100_0.3/0.5_0.0_1_0.1 checkpoint_filename:checkpoint.pth.tar data:./dataset/cifar10 data_backend:pytorch dataset:cifar10 delta:100 dense_allocation:0.001 distributed:False do_not_reset_epochs:False dynamic_loss_scale:False epochs:300 eval_batch_size:128 evaluate:False fp16:False gather_checkpoints:False gpu:0 grad_accumulation_n:1 id:2024_06_11_09_51_14_597399 iterative_T_end_percent:0.8 label_smoothing:0.0 lamb:0.5 local_rank:0 log_filename:./temp_log lr:0.1 lr_decay_epochs:30-60-80 lr_file:None lr_schedule:cosine2 lrconfig:None memory_format:nchw mixup:0.0 model_config:classic momentum:0.9 nesterov:False num_classes:10 optimizer:sgd optimizer_batch_size:-1 partial_k:0.1 pretrained_weights: print_freq:10 prof:-1 raport_file:experiment_raport.json restart_training:False resume:None run_epochs:-1 save_checkpoints:True seed:42 short_train:False sp_distribution:magnitude-exponential sparsity_thres:0.0 static_loss_scale:1 static_topo:0 training_only:False warmup:0 weight_decay:0.0005 widths:64-128-256-512-64 workers:8 workspace:./ world_size:1

YuchuanTian commented 5 months ago

I think I know the reason. This is not a training-from-scratch pruning and a pretrained weight had to be loaded. I am providing these weights for you (and if I haven't made it wrong, I made these weights by keeping the same configs except sparsity=0). Links for pretrained weights here I apologize for omitting this important details in the previous reply. I conducted this experiment ~2years ago and I struggled figuring out the hyperparameters in tons of logs and checkpoints.

Acasia commented 4 months ago

Thank you for sharing the pre-trained model weights.

I think the table was created based on the best accuracy. Comparing the best accuracy, it matches the performance in the table.

Thanks for your help.

P.S. The CIFAR-10 weight decay in the paper seems to be incorrectly listed as 0.005.

YuchuanTian commented 4 months ago

Yeah you are right, the weight decay should be 0.0005. Thanks again for raising this issue!

Acasia commented 4 months ago

Hi Yuchuan Tian!

I have one more question. As I mentioned before, I have checked that the best accuracy matches the accuracy in the paper's table. However, isn't the best accuracy different from the target sparsity? If it is correct to use the best accuracy, could you explain why the best accuracy was used instead of the last accuracy?