openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference
Apache License 2.0
938 stars 233 forks source link

Sparsity multitstep config is confusing and error prone #909

Open lpuglia opened 3 years ago

lpuglia commented 3 years ago

Hello, I'm trying to understand the details of the multistep sparsity training scheduler, in particular I'm learning from the config example. Here is the part that confuses me:

    "params": {
            "schedule": "multistep",  // The type of scheduling to use for adjusting the target sparsity level
            "patience": 3, // A regular patience parameter for the scheduler, as for any other standard scheduler. Specified in units of scheduler steps.
            "sparsity_target": 0.7, // Target value of the sparsity level for the model
            "sparsity_target_epoch": 3, // Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value
            "sparsity_freeze_epoch": 50, // Index of the epoch from which the sparsity mask will be frozen and no longer trained
            "multistep_steps": [10, 20], // A list of scheduler steps at which to transition to the next scheduled sparsity level (multistep scheduler only).
            "multistep_sparsity_levels": [0.2, 0.5, 0.7] // Levels of sparsity to use at each step of the scheduler as specified in the 'multistep_steps' attribute. The first sparsity level will be applied immediately, so the length of this list should be larger than the length of the 'steps' by one. The last sparsity level will function as the ultimate sparsity target, overriding the "sparsity_target" setting if it is present.
    },

Too me it seems that the multistep_* params are in contrast with the sparsity_* ones. According to sparsity_target and sparsity_target_epoch the schedule would be something like:

  1. train for the first 3 epochs with an unspecified level of sparsity (i suppose it is 0 but I'm not sure)
  2. train the remaining epochs with a sparsity level of 0.7 (it's clearly stated Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value)

meanwhile the multistep_* params are describing a schedule that looks like this:

  1. train the first 10 epochs with a sparsity level of 0.2
  2. train from epoch 10 to 20 with a sparsity of 0.5
  3. train the remaining epoch with a sparsity of 0.7

since these two behaviours are not really compatible it is not clear which one takes precedence over the other. As if this was not enough sparsity_freeze_epoch can be tricky in the following situation:

"sparsity_freeze_epoch": 20
"multistep_steps": [20]
"multistep_sparsity_levels": [0.3, 0.5] 

using this configuration will lead to a network that is only sparsified with a level of 0.3! I guess this is the intended behaviour but in my opinion it is too much error prone.

I know this is mainly due to the fact that multistep sparsity is not the only kind of schedule but the example doesn't do a good job at describing it. In my opinion a better description of the schedule would be achieved using a dictionary with epoch numbers as keys and sparsity levels as values, for example:

"params" : {
    "multistep_scheduler" : {
        "0" : 0.0,
        "10" : 0.2,
        "20" : 0.5,
        "29" : 0.7,
        "30" : "freeze"
    }
}

I think this is just a more immediate way to understand what the training scheduler is going to look like.

Thanks for the reply!

alexsu52 commented 3 years ago

Hi, @lpuglia. Thanks for your feedback and proposal!