Closed dustinvtran closed 6 years ago
yeah, that's a good point. one way we've dealt with similar issues in other contexts (but not yet in this one) is to use callables. so for example the user would pass an argument
how_many_steps_for_parameter(param_name)
that might look like
def how_many_steps_for_parameter(param_name):
if 'global' in param_name:
return 1 # instruct kl_qp to take 1 step
elif param_name=='some_other_param':
return 0 # instruct kl_qp to take no steps
else:
return 5 # instruct kl_qp to take 5 steps
this has the advantage that the user can specify an arbitrarily complicated step policy for the dynamic case.
broadly similar issues (in terms of allowing for finer control) can be dealt with similarly. so, for example, the pytorch idiom for doing a gradient step is something like:
loss.backward()
if the user wants to clip gradients, he or she does so between steps 2 and 3. in the context of kl_qp
, the user never interacts with the loss directly so with the current api there's no clean way to do this. an obvious solution here is to have the user provide a callable that allows one to manipulate gradients on a per-parameter basis. when i had to do gradient clipping for some example code (#119) i instead created my own optimizer that handled the clipping for me. this works too but is a bit less satisfying in terms of modularity/extensibility
We also have options to block sites in our algorithms. I can look up an example later, but that could also be part of the solution here.
I.e. we could block the global LV in he prob. pca in one klqp instance and the local LV in the other and iterate updates without having to write the models and guides any differently. I am unsure if this would go through to our optimizers, correctly, though.
On Tue, Sep 26, 2017 at 4:41 PM, martinjankowiak notifications@github.com wrote:
yeah, that's a good point. one way we've dealt with similar issues in other contexts (but not yet in this one) is to use callables. so for example the user would pass an argument
how_many_steps_for_parameter(param_name)
that might look like
def how_many_steps_for_parameter(param_name): if 'global' in param_name: return 1 # instruct kl_qp to take 1 step elif param_name=='some_other_param': return 0 # instruct kl_qp to take no steps else: return 5 # instruct kl_qp to take 5 steps
this has the advantage that the user can specify an arbitrarily complicated step policy for the dynamic case.
broadly similar issues (in terms of allowing for finer control) can be dealt with similarly. so, for example, the pytorch idiom for doing a gradient step is something like:
- construct loss
- call loss.backward()
- invoke optimizer which takes a gradient step
if the user wants to clip gradients, he or she does so between steps 2 and
- in the context of kl_qp, the user never interacts with the loss directly so with the current api there's no clean way to do this. an obvious solution here is to have the user provide a callable that allows one to manipulate gradients on a per-parameter basis. when i had to do gradient clipping for some example code (#119 https://github.com/uber/pyro/pull/119) i instead created my own optimizer that handled the clipping for me. this works too but is a bit less satisfying in terms of modularity/extensibility
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uber/pyro/issues/158#issuecomment-332366702, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVhL5uKAG1oxoHxnEZ0f89DNyooPkMNks5smYuugaJpZM4PlAFN .
expanding on @karalets 's point, @stuhlmueller wrote a nice example of this using the BlockPoutine
with char-rnns. albeit still a manual process (#73 addresses this), but i think it partially serves the purpose of allowing the user to have finer grain control over what to update. of course these are specified a priori so it doesnt address the dynamic case
i like the idea of adding a helper that "hides" params from optimization (probably using BlockPoutine
), using a mask function from names to bool. something like:
ELBo( hide_params(model, lambda name: name=="myname"), guide, ...)
we shouldn't need to provide an explicit schedule in this helper -- we can just construct several ELBo.step functions with different params masked, and call then as we like.
note that we can use this to hide params of model or of guide, so we don't need the model_fixed
flag anymore.
note also the similarity in logic to the lift
functionality (that upgrades a param to a RV, instead of hiding it).
The idea of having the ability to assign parameters to groups came up in webppl a few times. If pyro had this, we'd be able to tell KL_QP
to optimize particular parameters groups only. e.g.
# model
pyro.param('p', val, groups=['model'])
# guide
pyro.param('q', val, groups=['guide'])
# update model params only:
optim.step(update=['model'])
This would work for the dynamic case.
This isn't as compositional as the blocking approach, but maybe they can be combined. The important part about the grouping approach is attaching extra metadata to parameters, which seems a bit nicer than e.g. writing functions that match on the name string. The blocking based approach could leverage this too.
A related problem is the case where an inference algorithm has more than one objective to optimize, each with its own parameters (e.g. ELBO + baselines), where we might like to say which parameters should be updated for each objective. It seems like this could also be handled by whatever solution comes out of this. (Though in this particular case we might consider whether the algo. can be decomposed into smaller parts.)
For reference, this issue was also discussed in #73 and #49 and is related to PyTorch optimizer parameter groups (which I don't believe existed when we wrote our optimization interface?).
The important part about the grouping approach is attaching extra metadata to parameters, which seems a bit nicer than e.g. writing functions that match on the name string.
this is nice. feels like an easier (and more pythonic) interface than the functional one. we should make sure it handles all use cases.
related to PyTorch optimizer parameter groups (which I don't believe existed when we wrote our optimization interface?).
oh, good find! we should attempt to be compatible with that. although i don't think we can use quite their mechanism because of the need to group dynamically.
+1 to @null-a's suggestion. I really like that approach. Also the standard of writing the set of parameters to update rather than the set of parameters to not update.
This isn't as compositional as the blocking approach, but maybe they can be combined.
Maybe you can make it compositional via something like arg scopes. For example,
with pyro.arg_scope([pyro.param], groups=["model"]):
p = pyro.param('p', val)
will automatically include groups=["model"]
as kwargs to any pyro.param
call. Not sure if this is Pytorch-y but scoping is used all the time in Edward and TensorFlow. It tangentially relates to Venture's inference scopes.
explicit scoping seems like a great solution, as long as it's performance and control-flow innocuous in python!
what's the status of this proposal? seemed like everyone was on board... plan for 0.2?
This should be easier to implement if we switch from .backward()
to torch.autograd.grad()
#628 , which in turn should be easier once PyTorch optimizers support torch.autograd.grad()
https://github.com/pytorch/pytorch/issues/4179 .
I believe #1060 resolves the original issue, and we can discuss scoping elsewhere
KL_QP
-like algorithms currently have the argsmodel_fixed=False
andguide_fixed=False
to control whether their parameters are fixed during inference (KL_QP.step()
). This is restrictive for compositional inferences where you might establish twoKL_QP
algorithms, each of which handles parameter updates of a different latent variable.It would be nice if the arg is more locally at the parameter level, where the user might pass in a list of trainable variables. One question is how to handle the dynamic case when the list of trainable variables grows at runtime so the user can't specify the list a priori.
model_fixed
andguide_fixed
do handle this case, so maybe a hybrid of these two approaches?I ran into this issue when playing with probabilistic PCA in https://github.com/uber/pyro/issues/112 and following Edward's approach of 5 local updates per 1 global update.