pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
84.74k stars 22.82k forks source link

Functional interface for optimizers #45242

Open vincentqb opened 4 years ago

vincentqb commented 4 years ago

🚀 Feature

As requested in the context of distributed work in #44715 #44791 (cc @jbschlosser @vincentqb @albanD @wanchaol, internal doc) and of meta-learning in #39279 (cc @egrefen @seba-1511), we want to refactor the optimizer in order to provide a functional form to them (with no other algorithmic changes). The current object form would simply call into the functional form.

Please feel free to comment :)

cc @vincentqb

vadimkantorov commented 4 years ago

My wish: they should still have a memory-economic / low-copy mode somehow (via inplace or otherwise)

albanD commented 4 years ago

@vadimkantorov the plan here is just code refactor. There must be no change on how the existing API behaves.

vincentqb commented 4 years ago

@vadimkantorov the plan here is just code refactor. There must be no change on how the existing API behaves.

Yes, added emphasis in the description.

fmassa commented 4 years ago

@albanD can you expand how we can perform in-place parameter updates with autograd support? Or are we going to be having two implementations, one inplace and the other out-of-place?

From @vincentqb comment in https://github.com/pytorch/pytorch/issues/39279#issuecomment-685817319 it looks like that might be the case, via a differentiable flag. The approach proposed in that comment (which I paste below) makes it so that the parameters of the model are not necessarily instances of nn.Parameter anymore. I don't know if this might have undesirable side-effects?

for name, update in zip(model._parameters, updates):
    model._parameters[name] = model._parameters[name] - lr * update
albanD commented 4 years ago

There are two things:

So I think it will depend on the exact optimizer but the first point will be taken care of by the higher API that should do cloning or not and the optimizer can always modify the param inplace. For the second one, we will need some part of the implementation to diverge. But that should be fairly limited in scope and I expect most of the code will be re-used.

pietrolesci commented 3 years ago

Hi @vincentqb,

Re: meta-learning applications, this would be extremely useful. Any update on the status?

Thanks a lot, Pietro