Specifying parameters to update during inference

dustinvtran commented 7 years ago

KL_QP-like algorithms currently have the args model_fixed=False and guide_fixed=False to control whether their parameters are fixed during inference (KL_QP.step()). This is restrictive for compositional inferences where you might establish two KL_QP algorithms, each of which handles parameter updates of a different latent variable.

It would be nice if the arg is more locally at the parameter level, where the user might pass in a list of trainable variables. One question is how to handle the dynamic case when the list of trainable variables grows at runtime so the user can't specify the list a priori. model_fixed and guide_fixed do handle this case, so maybe a hybrid of these two approaches?

I ran into this issue when playing with probabilistic PCA in https://github.com/uber/pyro/issues/112 and following Edward's approach of 5 local updates per 1 global update.

martinjankowiak commented 7 years ago

yeah, that's a good point. one way we've dealt with similar issues in other contexts (but not yet in this one) is to use callables. so for example the user would pass an argument

how_many_steps_for_parameter(param_name)

that might look like

def how_many_steps_for_parameter(param_name):
    if 'global' in param_name:
        return 1 # instruct kl_qp to take 1 step
    elif param_name=='some_other_param':
        return 0 # instruct kl_qp to take no steps
    else:
        return 5 # instruct kl_qp to take 5 steps

this has the advantage that the user can specify an arbitrarily complicated step policy for the dynamic case.

broadly similar issues (in terms of allowing for finer control) can be dealt with similarly. so, for example, the pytorch idiom for doing a gradient step is something like:

construct loss
call loss.backward()
invoke optimizer which takes a gradient step

if the user wants to clip gradients, he or she does so between steps 2 and 3. in the context of kl_qp, the user never interacts with the loss directly so with the current api there's no clean way to do this. an obvious solution here is to have the user provide a callable that allows one to manipulate gradients on a per-parameter basis. when i had to do gradient clipping for some example code (#119) i instead created my own optimizer that handled the clipping for me. this works too but is a bit less satisfying in terms of modularity/extensibility

karalets commented 7 years ago

We also have options to block sites in our algorithms. I can look up an example later, but that could also be part of the solution here.

I.e. we could block the global LV in he prob. pca in one klqp instance and the local LV in the other and iterate updates without having to write the models and guides any differently. I am unsure if this would go through to our optimizers, correctly, though.

On Tue, Sep 26, 2017 at 4:41 PM, martinjankowiak notifications@github.com wrote:

yeah, that's a good point. one way we've dealt with similar issues in other contexts (but not yet in this one) is to use callables. so for example the user would pass an argument

how_many_steps_for_parameter(param_name)

that might look like

def how_many_steps_for_parameter(param_name): if 'global' in param_name: return 1 # instruct kl_qp to take 1 step elif param_name=='some_other_param': return 0 # instruct kl_qp to take no steps else: return 5 # instruct kl_qp to take 5 steps

this has the advantage that the user can specify an arbitrarily complicated step policy for the dynamic case.

broadly similar issues (in terms of allowing for finer control) can be dealt with similarly. so, for example, the pytorch idiom for doing a gradient step is something like:

construct loss

call loss.backward()

invoke optimizer which takes a gradient step

if the user wants to clip gradients, he or she does so between steps 2 and

in the context of kl_qp, the user never interacts with the loss directly so with the current api there's no clean way to do this. an obvious solution here is to have the user provide a callable that allows one to manipulate gradients on a per-parameter basis. when i had to do gradient clipping for some example code (#119 https://github.com/uber/pyro/pull/119) i instead created my own optimizer that handled the clipping for me. this works too but is a bit less satisfying in terms of modularity/extensibility

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uber/pyro/issues/158#issuecomment-332366702, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVhL5uKAG1oxoHxnEZ0f89DNyooPkMNks5smYuugaJpZM4PlAFN .

jpchen commented 7 years ago

expanding on @karalets 's point, @stuhlmueller wrote a nice example of this using the BlockPoutine with char-rnns. albeit still a manual process (#73 addresses this), but i think it partially serves the purpose of allowing the user to have finer grain control over what to update. of course these are specified a priori so it doesnt address the dynamic case

ngoodman commented 7 years ago

i like the idea of adding a helper that "hides" params from optimization (probably using BlockPoutine), using a mask function from names to bool. something like:

ELBo( hide_params(model, lambda name: name=="myname"), guide, ...)

we shouldn't need to provide an explicit schedule in this helper -- we can just construct several ELBo.step functions with different params masked, and call then as we like.

note that we can use this to hide params of model or of guide, so we don't need the model_fixed flag anymore.

note also the similarity in logic to the lift functionality (that upgrades a param to a RV, instead of hiding it).

null-a commented 7 years ago

The idea of having the ability to assign parameters to groups came up in webppl a few times. If pyro had this, we'd be able to tell KL_QP to optimize particular parameters groups only. e.g.

# model
pyro.param('p', val, groups=['model'])
# guide
pyro.param('q', val, groups=['guide'])
# update model params only:
optim.step(update=['model'])

This would work for the dynamic case.

This isn't as compositional as the blocking approach, but maybe they can be combined. The important part about the grouping approach is attaching extra metadata to parameters, which seems a bit nicer than e.g. writing functions that match on the name string. The blocking based approach could leverage this too.

A related problem is the case where an inference algorithm has more than one objective to optimize, each with its own parameters (e.g. ELBO + baselines), where we might like to say which parameters should be updated for each objective. It seems like this could also be handled by whatever solution comes out of this. (Though in this particular case we might consider whether the algo. can be decomposed into smaller parts.)

eb8680 commented 7 years ago

For reference, this issue was also discussed in #73 and #49 and is related to PyTorch optimizer parameter groups (which I don't believe existed when we wrote our optimization interface?).

ngoodman commented 7 years ago

The important part about the grouping approach is attaching extra metadata to parameters, which seems a bit nicer than e.g. writing functions that match on the name string.

this is nice. feels like an easier (and more pythonic) interface than the functional one. we should make sure it handles all use cases.

related to PyTorch optimizer parameter groups (which I don't believe existed when we wrote our optimization interface?).

oh, good find! we should attempt to be compatible with that. although i don't think we can use quite their mechanism because of the need to group dynamically.

dustinvtran commented 7 years ago

+1 to @null-a's suggestion. I really like that approach. Also the standard of writing the set of parameters to update rather than the set of parameters to not update.

This isn't as compositional as the blocking approach, but maybe they can be combined.

Maybe you can make it compositional via something like arg scopes. For example,

with pyro.arg_scope([pyro.param], groups=["model"]):
  p = pyro.param('p', val)

will automatically include groups=["model"] as kwargs to any pyro.param call. Not sure if this is Pytorch-y but scoping is used all the time in Edward and TensorFlow. It tangentially relates to Venture's inference scopes.

ngoodman commented 7 years ago

explicit scoping seems like a great solution, as long as it's performance and control-flow innocuous in python!

ngoodman commented 6 years ago

what's the status of this proposal? seemed like everyone was on board... plan for 0.2?

fritzo commented 6 years ago

This should be easier to implement if we switch from .backward() to torch.autograd.grad() #628 , which in turn should be easier once PyTorch optimizers support torch.autograd.grad() https://github.com/pytorch/pytorch/issues/4179 .

eb8680 commented 6 years ago

I believe #1060 resolves the original issue, and we can discuss scoping elsewhere

pyro-ppl / pyro

Specifying parameters to update during inference #158