Open lpuglia opened 3 years ago
Hi, @lpuglia. Thanks for your feedback and proposal!
multistep
scheduler ignores sparsity_target
and sparsity_target_epoch
parameters and calculates these parameters from multistep_steps
and multistep_sparsity_levels
parameters. Based on your feedback we will update documentation (cc' @MaximProshin) to make it clear.
Hello, I'm trying to understand the details of the multistep sparsity training scheduler, in particular I'm learning from the config example. Here is the part that confuses me:
Too me it seems that the
multistep_*
params are in contrast with thesparsity_*
ones. According tosparsity_target
andsparsity_target_epoch
the schedule would be something like:Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value
)meanwhile the
multistep_*
params are describing a schedule that looks like this:since these two behaviours are not really compatible it is not clear which one takes precedence over the other. As if this was not enough
sparsity_freeze_epoch
can be tricky in the following situation:using this configuration will lead to a network that is only sparsified with a level of 0.3! I guess this is the intended behaviour but in my opinion it is too much error prone.
I know this is mainly due to the fact that multistep sparsity is not the only kind of schedule but the example doesn't do a good job at describing it. In my opinion a better description of the schedule would be achieved using a dictionary with epoch numbers as keys and sparsity levels as values, for example:
I think this is just a more immediate way to understand what the training scheduler is going to look like.
Thanks for the reply!