ziplab / SPViT

[TPAMI 2024] This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.
Apache License 2.0
105 stars 14 forks source link

Target prune FLOPs ratio #4

Open King4819 opened 6 months ago

King4819 commented 6 months ago

Excellent work !!! I want to ask whether the method is able to decide different prune FLOPs ratio. For example, I want to perform different level of pruning, such as prune FLOPs 30%, 60%, 90%, respectively. Thanks !

Charleshhy commented 6 months ago

Hi King4819,

Thanks for your interest in our work! The level of pruning depends on the hyper-parameters target flops and **loss_lambda** controlling the expected remaining flops and how strict you want the pruned model to be close to your target flops, respectively. These hyper-parameters are set in config files, e.g., L12-13 here.

Regards, Haoyu

King4819 commented 6 months ago

@Charleshhy thanks for your reply. Is there a proper way to set hypaparameter theta when the target FLOPs is change? how do I know which value to set? Thanks!!

Charleshhy commented 6 months ago

@Charleshhy thanks for your reply. Is there a proper way to set hypaparameter theta when the target FLOPs is change? how do I know which value to set? Thanks!!

In practice, I set these two values to let the FLOPs loss slightly higher than the cross-entropy loss at the beginning of training and find it works well. However, I didn't experiment much about the trade-off between these two losses and different settings may lead to better performance :)

King4819 commented 6 months ago

@Charleshhy Thanks for your reply. I want to ask it more explicitly. For example, I want to prune DeiT-S model at three different level: prune FLOPs ratio 30%, prune FLOPs ratio 60% and prune FLOPs ratio 90%. How do I alter hypaparameter theta for these three target FLOPs ? Or this is an unexperimented direction ?

Charleshhy commented 6 months ago

@Charleshhy Thanks for your reply. I want to ask it more explicitly. For example, I want to prune DeiT-S model at three different level: prune FLOPs ratio 30%, prune FLOPs ratio 60% and prune FLOPs ratio 90%. How do I alter hypaparameter theta for these three target FLOPs ? Or this is an unexperimented direction ?

Different target FLOPs would lead to different FLOPs loss and we need to adjust \theta to hack its value to be slightly higher than the cross-entropy loss in practice. Note that pruning 90% FLOPs is too aggressive and I have not tried it :)

King4819 commented 6 months ago

@Charleshhy Thanks for your reply!

bo102 commented 6 months ago

Hello, I would like to ask you about the "theta": 1.5 in swin-transformer pruning, what does it mean?what do theta and 1.5 mean?In addition, the recognition accuracy is relatively low in the search process?ACC1% is only 10% in the number of search rounds about 40, is this normal?(I set theta to 0.5 in this process, and the target_flops is 2.9)

Charleshhy commented 6 months ago

Hi King4819,

Thanks for your interest in our work! The level of pruning depends on the hyper-parameters target flops and **loss_lambda** controlling the expected remaining flops and how strict you want the pruned model to be close to your target flops, respectively. These hyper-parameters are set in config files, e.g., L12-13 here.

Regards, Haoyu

@King4819 I made a mistake in explaining the hyper-parameters and have corrected it just now. loss lambda controls the importance of INSTEAD OF theta. theta here is another hyper-parameter to initialize the learnable gates to select the best options. Higher theta makes your model less likely to select convolutional operations and we empirically find 1.5 is a globally okay value for all settings.

Sorry for the confusion.

Charleshhy commented 6 months ago

Hello, I would like to ask you about the "theta": 1.5 in swin-transformer pruning, what does it mean?what do theta and 1.5 mean?In addition, the recognition accuracy is relatively low in the search process?ACC1% is only 10% in the number of search rounds about 40, is this normal?(I set theta to 0.5 in this process, and the target_flops is 2.9)

Hi Bo102, theta is introduced here. In my experiments, I expect the accuracy during the search to not drop too much (say only slightly lower than the dense model) so 10% of it is not normal. Try to reduce the hyper-parameter loss_lambda and make your architecture evolve slower.

bo102 commented 6 months ago

您好,我想问您关于“theta”的问题:swin-transformer 修剪中的 1.5,这是什么意思?theta 和 1.5 是什么意思?另外,在搜索过程中识别准确率比较低?ACC1%在40轮左右的搜索轮数中只有10%,这正常吗?(在这个过程中,我将θ设置为0.5,target_flops为2.9)

嗨,Bo102,在这里介绍。在我的实验中,我希望搜索过程中的准确率不会下降太多(比如说只比密集模型略低),所以 10% 是不正常的。尝试减少超参数,并使架构演进速度变慢。theta``loss_lambda

Hello, excuse me, I used your configuration to search for the swin-transformer pruning architecture, I just modified the dataset to imagenet-tiny-200, and the accuracy is still very low during the search process. Is this due to the fact that I need to modify the parameters in the configuration file according to my actual data set? What is the reason for this? Also, I use the above searched files to guide pruning, and the generated model, I can't improve the accuracy of the training level, and it has been hovering around 50%.

Charleshhy commented 6 months ago

您好,我想问您关于“theta”的问题:swin-transformer 修剪中的 1.5,这是什么意思?theta 和 1.5 是什么意思?另外,在搜索过程中识别准确率比较低?ACC1%在40轮左右的搜索轮数中只有10%,这正常吗?(在这个过程中,我将θ设置为0.5,target_flops为2.9)

嗨,Bo102,在这里介绍。在我的实验中,我希望搜索过程中的准确率不会下降太多(比如说只比密集模型略低),所以 10% 是不正常的。尝试减少超参数,并使架构演进速度变慢。thetaloss_lambda ``

Hello, excuse me, I used your configuration to search for the swin-transformer pruning architecture, I just modified the dataset to imagenet-tiny-200, and the accuracy is still very low during the search process. Is this due to the fact that I need to modify the parameters in the configuration file according to my actual data set? What is the reason for this? Also, I use the above searched files to guide pruning, and the generated model, I can't improve the accuracy of the training level, and it has been hovering around 50%.

During searching, a low accuracy means the model searched a trivial solution and that won't have good performance. My suggestion is to set a lower loss_lambda and/or learning rate during searching.

bo102 commented 6 months ago

您好,我想问您关于“theta”的问题:swin-transformer 修剪中的 1.5,这是什么意思?theta 和 1.5 是什么意思?另外,在搜索过程中识别准确率比较低?ACC1%在40轮左右的搜索轮数中只有10%,这正常吗?(在这个过程中,我将θ设置为0.5,target_flops为2.9)

嗨,Bo102,在这里介绍。在我的实验中,我希望搜索过程中的准确率不会下降太多(比如说只比密集模型略低),所以 10% 是不正常的。尝试减少超参数,并使架构演进速度变慢。loss_lambda ''theta

您好,请问,我使用您的配置搜索了 swin-transformer 修剪架构,我刚刚将数据集修改为 imagenet-tiny-200,搜索过程中准确率仍然很低。这是因为我需要根据我的实际数据集修改配置文件中的参数吗?这是什么原因?另外,我使用上面搜索到的文件来指导修剪,而生成的模型,我无法提高训练级别的准确率,它一直在 50% 左右徘徊。

在搜索过程中,低准确率意味着模型搜索了一个微不足道的解决方案,并且不会有良好的性能。我的建议是在搜索过程中设置较低的和/或学习率。loss_lambda

Thank you so much, I wish you all the best, I'll give it a try