Non-uniform sparsity layer-wise with PolynomialDecay scheduler

AmoghDabholkar commented 2 years ago

System information

TensorFlow version (you are using): 2.9.1
Are you willing to contribute it (Yes/No): Yes

Motivation

Instead of assuming that the initial and final sparsity are uniform across layers, is it possible to add a feature where the user can feed in either a custom sparsity map or a sparsity distribution generated using ERK (like in @evcu 's Rigl codebase - https://github.com/google-research/rigl/blob/master/rigl/sparse_utils.py) @evcu 's experiments have shown ERK to work better and in general polynomial scheduler also seems to work better, so incorporating that with the tfmot call would be very helpful.

chococigar commented 2 years ago

Hi AmoghDabholkar@, thanks for your input! We haven't considered this feature yet in tfmot sparsity, but will consider this to be included in our next batch of updates.

Thanks!

evcu commented 2 years ago

I agree it would be nice to support this. I've implemented ERK in a hacky way in one of our recent projects. The tricky thing is the layer parameter shapes required before the calculation, thus often requires initialization and key matching. It would be much easier if an initialized model is wrapped after the fact.

https://github.com/google-research/rigl/blob/0f029735f84e0120df05512244510e8ed48a4461/rigl/rl/dqn_agents.py#L163

tensorflow / model-optimization

Non-uniform sparsity layer-wise with PolynomialDecay scheduler #987