numenta / nupic.research

Experimental algorithms. Unsupported.
https://nupicresearch.readthedocs.io
GNU Affero General Public License v3.0
107 stars 60 forks source link

OneCycle LR & RigL Experiments #489

Closed mvacaporale closed 3 years ago

mvacaporale commented 3 years ago

RigL

The PR includes utilities for global pruning by RigL:

Main Results: model train loss eval loss perplexity
tiny_bert_static_sparse_300k 4.537 3.997 54.432
tiny_bert_rigl_sparse_300k 5.963 5.774 321.95

^Note: These are sparse models.

OneCycleLR

This PR also includes mixins for using the OneCycleLR scheduler and a LR Range Test mixin to help find good bounds for the max and min learning rates. The results are below. The max_lr was tuned tuned manually to be 0.01 while all the other params go by the default for the scheduler.

model train loss eval loss perplexity
tiny_bert_50k 5.990 5.800 330.234
tiny_bert_one_cycle_lr_50k 4.083 3.605 36.767

^Note: These are dense models.

TODO:

mvacaporale commented 3 years ago

This won't work with the distillation mixin. I am approving anyway so we can merge and move on, but this should be fixed before we can combine the mixins.

Thanks for catching this. I'll make RigL into a mixin, seems it's the only way. But I'll leave it for a soon-to-come PR. Otherwise, all of your other comments have been addressed.