tanluren / yolov3-channel-and-layer-pruning

yolov3 yolov4 channel and layer pruning, Knowledge Distillation 层剪枝,通道剪枝,知识蒸馏
Apache License 2.0
1.5k stars 446 forks source link

开启prebias选项训练出错 #33

Open mozpp opened 4 years ago

mozpp commented 4 years ago

步骤:稀疏化训练

ssh://root@127.0.0.1:8026/usr/local/bin/python -u /project/pytorch-yolov3/train_prune.py --data=data/person_1cls.data --batch-size=4 --cfg=cfg/yolov3-spp-1cls-a2.cfg --weights=weights/yolov3-spp.weights --device=0 --prebias -sr --s=0.001 --prune=0
Namespace(accumulate=2, adam=False, arc='defaultpw', batch_size=4, bucket='', cache_images=False, cfg='cfg/yolov3-spp-1cls-a2.cfg', data='data/person_1cls.data', device='0', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, name='', nosave=False, notest=False, prebias=True, prune=0, rect=False, resume=False, s=0.001, sr=True, t_cfg='', t_weights='', transfer=False, var=None, weights='weights/yolov3-spp.weights')
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1660', total_memory=5941MB)

loaded weights from weights/yolov3-spp.weights 

normal sparse training 
Model Summary: 225 layers, 6.25733e+07 parameters, 54 gradients
Starting prebias for 1 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total      soft    rratio   targets  img_size
  0%|                                                  | 0/1524 [00:00<?, ?it/s]learning rate: 1e-06
Traceback (most recent call last):
  File "/project/pytorch-yolov3/train_prune.py", line 537, in <module>
    prebias()  # optional
  File "/project/pytorch-yolov3/train_prune.py", line 484, in prebias
    train()  # transfer-learn yolo biases for 1 epoch
  File "/project/pytorch-yolov3/train_prune.py", line 380, in train
    BNOptimizer.updateBN(sr_flag, model.module_list, opt.s, prune_idx, idx2mask)
  File "/project/pytorch-yolov3/utils/prune_utils.py", line 148, in updateBN
    bn_module.weight.grad.data.add_(s * torch.sign(bn_module.weight.data))  # L1
AttributeError: 'NoneType' object has no attribute 'data'
  0%|                                                  | 0/1524 [00:02<?, ?it/s]

Process finished with exit code 1

关掉prebias后可以训练,请问大家有没有这个问题?

mozpp commented 4 years ago

找到原因了

        for p in model.parameters():
            if opt.prebias and p.numel() == nf:  # train (yolo biases)
                p.requires_grad = True
            elif opt.transfer and p.shape[0] == nf:  # train (yolo biases+weights)
                p.requires_grad = True
            else:  # freeze layer
                p.requires_grad = False #todo:剪枝时开启prebias会AttributeError: 'NoneType' object has no attribute 'data'

prebias时把p.requires_grad = False,但是updateBN时又要requires_grad,所以矛盾了,up主有没有什么好的方法解决

zbyuan commented 4 years ago

en ?我们没有train_prune.py这个文件呀,进行训练 稀疏化 微调训练都是train.py呀

mozpp commented 4 years ago

en ?我们没有train_prune.py这个文件呀,进行训练 稀疏化 微调训练都是train.py呀

train_prune只是我对train的重命名

zbyuan commented 4 years ago

稀疏化训练采用 --prune 1

mozpp commented 4 years ago

这个和--prune 1没关系的。 我这个issue是提出一个隐患,prebias选项关掉也能练

ssh://root@127.0.0.1:8026/usr/local/bin/python -u /project/pytorch-yolov3/train.py --data=data/person_1cls.data --batch-size=4 --img-size=320 --cfg=cfg/yolov3-spp-1cls-a2.cfg --weights=weights/yolov3-spp.weights --device=0 --mix_up --prebias -sr --s=0.001 --prune=1
Namespace(accumulate=2, adam=False, arc='default', batch_size=4, bucket='', cache_images=False, cfg='cfg/yolov3-spp-1cls-a2.cfg', data='data/person_1cls.data', device='0', epochs=273, evolve=False, img_size=320, img_weights=False, mix_up=True, multi_scale=False, name='', nosave=False, notest=False, prebias=True, prune=1, rect=False, resume=False, s=0.001, sr=True, transfer=False, var=None, weights='weights/yolov3-spp.weights')
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1660', total_memory=5941MB)

shortcut sparse training
Model Summary: 225 layers, 6.25733e+07 parameters, 54 gradients
Starting prebias for 1 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size sum_gamma
  0%|                                                  | 0/1524 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/project/pytorch-yolov3/train.py", line 532, in <module>
    prebias()  # optional
  File "/project/pytorch-yolov3/train.py", line 439, in prebias
    train()  # transfer-learn yolo biases for 1 epoch
  File "/project/pytorch-yolov3/train.py", line 323, in train
    sum_gamma = BNOptimizer.updateBN(sr_flag, model.module_list, opt.s, prune_idx, idx2mask)
  File "/project/pytorch-yolov3/utils/prune_utils.py", line 148, in updateBN
    bn_module.weight.grad.data.add_(s * torch.sign(bn_module.weight.data))  # L1
AttributeError: 'NoneType' object has no attribute 'data'
  0%|                                                  | 0/1524 [00:03<?, ?it/s]

Process finished with exit code 1
zbyuan commented 4 years ago

prebias是正式训练前先跑一个epoch,单独训练三个yolo层前的卷积层,其他位置冻结,是为了更好的迁移学习。这是u的一个小trick,如果想用也是用在正常训练阶段,稀疏训练不需要用到