What does this PR do?

Extracts the trainable params on __init__ using nnx.variables into a self.params attribute. This effectively fixes the set of trainable params, in the current implementation if the model adds new params then update will fail. In a future PR we can make the following changes if we want to continue in this direction:

Have __init__ accept a State object with the Variables directly.
Remove the wrt argument.
Remove the model attribute.

Example:

params = nnx.variables(model, nnx.Param)
optimizer = nnx.Optimizer(params, tx=optax.adam(0.01)

Discussion

Currently there is some benefit of having a reference to model inside Optimizer in that potentially you could just pass the optimizer to some of the functions and then use the model from there. On the other hand, having a pure Variables structure is inline with how Pytorch / MLX represent the optimizers.

google / flax

[nnx] Optimizer uses variables #4347

What does this PR do?

Discussion