The linearRegression for resnet-50 is to slow and reward is low.

aliceyayunji commented 4 years ago

Hi all, I use the im2col to implement the conv3*3 linearRegression, but the prune speed is slow, it takes about 2 days to finish 800 steps train. The reward is converage, but the acc reward is low (prune 50%) , the best acc is about 8%。 the resnet-18 is better acc highest reward is 23%. Is this phenomenon normal ?

koda12344505 commented 4 years ago

Hello

if you don't mind, could you share your implementation of conv3*3 linearRegression, please?

hope you can share the code thanks.

aliceyayunji commented 4 years ago

Hello

if you don't mind, could you share your implementation of conv3*3 linearRegression, please?

hope you can share the code thanks.

you can implement the conv3*3 Linear Regression according to the Intel implementation: https://github.com/NervanaSystems/distiller/blob/master/distiller/pruning/ranked_structures_pruner.py class FMReconstructionChannelPruner(_RankedStructureParameterPruner).

But it's not recommended becasue the regression progress is pretty slow, and the reward value is low for resnet-50, if the network is larger the worse. And I suggest that, you can retrain the prunned networked without Linear Regression for a few batches eg, 100 with small lr 0.001. the DDPG also converage well and the reward value is high.

koda12344505 commented 4 years ago

Hello if you don't mind, could you share your implementation of conv3*3 linearRegression, please? hope you can share the code thanks.

you can implement the conv3*3 Linear Regression according to the Intel implementation: https://github.com/NervanaSystems/distiller/blob/master/distiller/pruning/ranked_structures_pruner.py class FMReconstructionChannelPruner(_RankedStructureParameterPruner).

But it's not recommended becasue the regression progress is pretty slow, and the reward value is low for resnet-50, if the network is larger the worse. And I suggest that, you can retrain the prunned networked without Linear Regression for a few batches eg, 100 with small lr 0.001. the DDPG also converage well and the reward value is high.

Hello,

your suggestion is that retrain whole pruned network after pruning whole layers? i mean, do not linear regression layer by layer, just retrain whole network

mit-han-lab / amc

The linearRegression for resnet-50 is to slow and reward is low. #22