Supporting independent optimization problems via tensor DSL

neerajprad commented 6 years ago

Closely related, more specific use case: #1213.

Can we use our tensor DSL to support the use case of solving a batch of independent optimization problems? The proposed solution is to use broadcasting to optimize a batch of parameters independently on the same model and guide. This has been brought up in multiple contexts - batched second order optimizers (#1213), experimenting with population based algorithms such as genetic algorithms, and OED (@martinjankowiak).

One possible solution, as suggested by @eb8680 in #1213, would be to introduce an iparam messenger (which may additionally broadcast param statements by the size specified and reshape all sample statements appropriately).

Things to think about:

Location for the iparam block - it can be either innermost or outermost. If it is outermost, we will need a cap on enum dims (max_enum_dims). Related - is there a use case to allow for nesting or should we restrict to one flat iparam dim?
Additionally, it seems like iparam would have to be external to any iarange or at least not interact with iarange. This may be unavoidable as these additional dims may be bound to the optimization algorithm itself. e.g. if we want to report losses per iparam batch, or in case of NE algorithms, select the top k params in terms of loss as the pseudo offspring. One way to ensure this is to only allow them as messengers, and not as context managers in the way iaranges are defined.
All sample statements inside an iparam block will have one or more additional leftmost (or rightmost) dims corresponding to the iparam. We may need poutine.broadcast to populate these additional dims as batch dims, or reshape them appropriately (say, if we want them to be leftmost) if they already depend on param statements.
pyro.module will have to be modified to support independent neural network weights / biases inside an iparam (by default NNs will have all their input neurons connected to the subsequent layer which we don't want with separate param batches). Related: #503.

Feel free to edit or add other solutions / use cases / design considerations.

eb8680 commented 6 years ago

pyro.module will have to be modified to support independent neural network weights / biases

Related: #503

jpchen commented 6 years ago

should we restrict to one flat iparam dim?

i think that should be fine, like pytorch's batch operations eg (bmm) only allow one dimension of batching

pyro.module will have to be modified to support independent neural network weights

i may be overthinking it, but this seems tricky to incorporate in the iparam construct since neural networks are a collection of connected weights/params represented as multiple tensors. so it may be difficult to broadcast tensor operations per iparam messenger correctly. my original implementation of vec_random_module stipulated that a user must pass in a "batch nn" (a nn with all batch operations), so the independence within a batch was guaranteed by the user.

pyro-ppl / pyro

Supporting independent optimization problems via tensor DSL #1330

Things to think about: