Functional interface for optimizers

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

https://pytorch.org

Other

84.74k stars 22.82k forks source link

Functional interface for optimizers #45242

Open vincentqb opened 4 years ago

vincentqb commented 4 years ago

🚀 Feature

As requested in the context of distributed work in #44715 #44791 (cc @jbschlosser @vincentqb @albanD @wanchaol, internal doc) and of meta-learning in #39279 (cc @egrefen @seba-1511), we want to refactor the optimizer in order to provide a functional form to them (with no other algorithmic changes). The current object form would simply call into the functional form.

[x] Update all python optimizers
[ ] Update all C++ optimizers

Please feel free to comment :)

cc @vincentqb

vadimkantorov commented 4 years ago

My wish: they should still have a memory-economic / low-copy mode somehow (via inplace or otherwise)

albanD commented 4 years ago

@vadimkantorov the plan here is just code refactor. There must be no change on how the existing API behaves.

vincentqb commented 4 years ago

@vadimkantorov the plan here is just code refactor. There must be no change on how the existing API behaves.

Yes, added emphasis in the description.

fmassa commented 4 years ago

@albanD can you expand how we can perform in-place parameter updates with autograd support? Or are we going to be having two implementations, one inplace and the other out-of-place?

From @vincentqb comment in https://github.com/pytorch/pytorch/issues/39279#issuecomment-685817319 it looks like that might be the case, via a differentiable flag. The approach proposed in that comment (which I paste below) makes it so that the parameters of the model are not necessarily instances of nn.Parameter anymore. I don't know if this might have undesirable side-effects?

for name, update in zip(model._parameters, updates):
    model._parameters[name] = model._parameters[name] - lr * update

albanD commented 4 years ago

There are two things:

The original Parameter (leafs) cannot be modified inplace as we need to use them to perform the outter optimization. The intermediary ones can be modified inplace if their value is not needed for backward (unfortunately they usually are as weights for conv/linear etc are always required).
The other ops within the optimizer implementation might not be able to be done inplace either if they will prevent the backward pass because they overwrite some values that are required.

So I think it will depend on the exact optimizer but the first point will be taken care of by the higher API that should do cloning or not and the optimizer can always modify the param inplace. For the second one, we will need some part of the implementation to diverge. But that should be fairly limited in scope and I expect most of the code will be re-used.

pietrolesci commented 3 years ago

Hi @vincentqb,

Re: meta-learning applications, this would be extremely useful. Any update on the status?

Thanks a lot, Pietro