GMP Pruning and Related Functionality

This PR is for GMP pruning mixins, configs, and related functionality. It includes the following changes:

Enable fully dense but sparsifiable models at initialization
Updated wide tiny-bert experiments
Add script to create a prunable checkpoint (one with SparseWeight modules) of a densely trained model.
LR schedule for GMP pruning on a fully dense model.
The RezeroWeights callback is now configurable to log every log_steps instead of after every training step.
New mixins: GradualMagnitudePruningMixin and ThreeStageLRMixin

The last two mixins can be used together or independently depending on the use case. One may apply GMP pruning during pre-training or afterwards. If the latter, ThreeStageLRMixin should be used to enable lr phases of stabilization, pruning, and fine-tuning. Otherwise, a OneCycle LR or other schedule may be used for GMP throughout pre-training. As of now, there are experiments that try out both methods and it's an open question as to which leads to the best eval-loss.

numenta / nupic.research

GMP Pruning and Related Functionality #551