neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.01k stars 140 forks source link

YOLOv8 - INT4 Training #2346

Open yoloyash opened 1 week ago

yoloyash commented 1 week ago

Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior (-20mAP@0.50)? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.

Recipe

version: 1.1.0 __metadata__: # General Hyperparams pruning_num_epochs: 90 pruning_init_lr: 0.01 pruning_final_lr: 0.0002 weights_warmup_lr: 0 biases_warmup_lr: 0.1 qat_init_lr: 1e-4 qat_final_lr: 1e-6 # Pruning Hyperparams init_sparsity: 0.05 pruning_start_epoch: 4 pruning_end_epoch: 50 pruning_update_frequency: 1.0 # Quantization variables qat_start_epoch: eval(pruning_num_epochs) qat_epochs: 3 qat_end_epoch: eval(qat_start_epoch + qat_epochs) observer_freeze_epoch: eval(qat_end_epoch) bn_freeze_epoch: eval(qat_end_epoch) qat_ft_epochs: 3 num_epochs: eval(pruning_num_epochs + qat_epochs + 2 * qat_ft_epochs) #Modifiers training_modifiers: - !EpochRangeModifier start_epoch: 0 end_epoch: eval(num_epochs) - !LearningRateFunctionModifier start_epoch: 3 end_epoch: eval(pruning_num_epochs) lr_func: linear init_lr: eval(pruning_init_lr) final_lr: eval(pruning_final_lr) - !LearningRateFunctionModifier start_epoch: 0 end_epoch: 3 lr_func: linear init_lr: eval(weights_warmup_lr) final_lr: eval(pruning_init_lr) param_groups: [0, 1] - !LearningRateFunctionModifier start_epoch: 0 end_epoch: 3 lr_func: linear init_lr: eval(biases_warmup_lr) final_lr: eval(pruning_init_lr) param_groups: [2] - !LearningRateFunctionModifier start_epoch: eval(qat_start_epoch) end_epoch: eval(qat_end_epoch) lr_func: cosine init_lr: eval(qat_init_lr) final_lr: eval(qat_final_lr) - !LearningRateFunctionModifier start_epoch: eval(qat_end_epoch) end_epoch: eval(qat_end_epoch + qat_ft_epochs) lr_func: cosine init_lr: eval(qat_init_lr) final_lr: eval(qat_final_lr) - !LearningRateFunctionModifier start_epoch: eval(qat_end_epoch + qat_ft_epochs) end_epoch: eval(qat_end_epoch + 2 * qat_ft_epochs) lr_func: cosine init_lr: eval(qat_init_lr) final_lr: eval(qat_final_lr) pruning_modifiers: - !ConstantPruningModifier start_epoch: eval(qat_start_epoch) params: ["re:^((?!dfl).)*$"] - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.46 params: - model.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8999 params: - model.1.conv.weight - model.4.m.1.cv1.conv.weight - model.4.m.4.cv2.conv.weight - model.6.m.1.cv1.conv.weight - model.21.m.1.cv1.conv.weight - model.21.m.2.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.514 params: - model.2.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.7675 params: - model.2.cv2.conv.weight - model.12.m.0.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8117 params: - model.3.conv.weight - model.8.cv2.conv.weight - model.12.m.1.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.6457 params: - model.4.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8627 params: - model.4.cv2.conv.weight - model.5.conv.weight - model.8.m.1.cv1.conv.weight - model.22.cv3.1.1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8764 params: - model.4.m.0.cv1.conv.weight - model.6.m.3.cv2.conv.weight - model.7.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9189 params: - model.4.m.1.cv2.conv.weight - model.6.m.5.cv1.conv.weight - model.15.m.2.cv1.conv.weight - model.18.m.0.cv1.conv.weight - model.18.m.2.cv1.conv.weight - model.22.cv3.0.1.conv.weight - model.22.cv3.2.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8305 params: - model.4.m.2.cv1.conv.weight - model.4.m.5.cv2.conv.weight - model.6.cv2.conv.weight - model.6.m.4.cv2.conv.weight - model.15.m.0.cv2.conv.weight - model.15.m.1.cv1.conv.weight - model.15.m.2.cv2.conv.weight - model.18.cv2.conv.weight - model.21.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.7417 params: - model.4.m.2.cv2.conv.weight - model.18.cv1.conv.weight - model.22.cv3.2.1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8888 params: - model.4.m.3.cv2.conv.weight - model.6.m.3.cv1.conv.weight - model.15.m.1.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.6063 params: - model.6.cv1.conv.weight - model.12.cv1.conv.weight - model.12.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9468 params: - model.6.m.0.cv1.conv.weight - model.21.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.7907 params: - model.6.m.0.cv2.conv.weight - model.8.m.0.cv1.conv.weight - model.12.m.0.cv2.conv.weight - model.12.m.1.cv1.conv.weight - model.22.cv2.2.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9409 params: - model.6.m.1.cv2.conv.weight - model.18.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.6811 params: - model.8.cv1.conv.weight - model.15.cv1.conv.weight - model.15.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9343 params: - model.8.m.0.cv2.conv.weight - model.8.m.1.cv2.conv.weight - model.18.m.0.cv2.conv.weight - model.18.m.1.cv1.conv.weight - model.21.m.0.cv1.conv.weight - model.21.m.1.cv2.conv.weight - model.22.cv3.0.0.conv.weight - model.22.cv3.1.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9771 params: - model.8.m.2.cv1.conv.weight - model.22.cv2.0.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.989 params: - model.8.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.5626 params: - model.9.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.713 params: - model.9.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9099 params: - model.12.m.2.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.927 params: - model.12.m.2.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9521 params: - model.16.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9569 params: - model.18.m.1.cv2.conv.weight - model.19.conv.weight - model.21.m.0.cv2.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.8474 params: - model.21.cv1.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.9651 params: - model.22.cv2.1.0.conv.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 - !GMPruningModifier init_sparsity: eval(init_sparsity) final_sparsity: 0.4 params: - model.22.cv3.0.2.weight - model.22.cv3.1.2.weight inter_func: cubic global_sparsity: false start_epoch: eval(pruning_start_epoch) end_epoch: eval(pruning_end_epoch) update_frequency: 1 quantization_modifiers: - !QuantizationModifier start_epoch: eval(qat_start_epoch) disable_quantization_observer_epoch: eval(observer_freeze_epoch) freeze_bn_stats_epoch: eval(bn_freeze_epoch) ignore: ['Upsample', 'Concat', 'model.22.dfl.conv'] scheme_overrides: model.2.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.2.m.0.cv1.conv: input_activations: null model.2.m.0.add_input_0: input_activations: null model.4.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.4.m.0.cv1.conv: input_activations: null model.4.m.0.add_input_0: input_activations: null model.4.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.5.conv: input_activations: null model.6.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.6.m.0.cv1.conv: input_activations: null model.6.m.0.add_input_0: input_activations: null model.6.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.7.conv: input_activations: null output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.8.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.8.m.0.cv1.conv: input_activations: null model.8.m.0.add_input_0: input_activations: null model.8.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.9.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.9.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.m.0.cv1.conv: input_activations: null model.12.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.m.1.cv1.conv: input_activations: null model.12.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.m.2.cv1.conv: input_activations: null model.12.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.12.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.cv1.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.m.0.cv1.conv: input_activations: null model.15.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.m.1.cv1.conv: input_activations: null model.15.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.m.2.cv1.conv: input_activations: null model.15.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.15.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.16.conv: input_activations: null model.16.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.18.cv1.act: output_activations: num_bits: 4 symmetric: false weights: num_bits: 4 symmetric: True strategy: "channel" model.18.m.0.cv1.conv: input_activations: null model.18.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.18.m.1.cv1.conv: input_activations: null model.18.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.18.m.2.cv1.conv: input_activations: null model.18.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.19.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.21.cv1.act: output_activations: num_bits: 4 symmetric: false weights: num_bits: 4 symmetric: True strategy: "channel" model.21.m.0.cv1.conv: input_activations: null model.21.m.0.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.21.m.1.cv1.conv: input_activations: null model.21.m.1.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.21.m.2.cv1.conv: input_activations: null model.21.m.2.cv2.act: output_activations: num_bits: 4 symmetric: False weights: num_bits: 4 symmetric: True strategy: "channel" model.22.cv2.0.0.conv: input_activations: null model.22.cv3.0.0.conv: input_activations: null