This includes the most up-to-date 2x and 4x configs as well as
Small fixes to RigL when gradient_accumulation_steps>1 and prune_fractionn is 0.
Fix default sparsify_all_embeddings to False - this isn't needed to properly run the 80% sparse model as it's already False in the config. But it's helpful to have there for later.
This includes the most up-to-date 2x and 4x configs as well as
sparsify_all_embeddings
to False - this isn't needed to properly run the 80% sparse model as it's already False in the config. But it's helpful to have there for later.kwargs["lr_scheduler"] is None