Open neerajprad opened 6 years ago
pyro.module will have to be modified to support independent neural network weights / biases
Related: #503
should we restrict to one flat iparam dim?
i think that should be fine, like pytorch's batch operations eg (bmm
) only allow one dimension of batching
pyro.module will have to be modified to support independent neural network weights
i may be overthinking it, but this seems tricky to incorporate in the iparam
construct since neural networks are a collection of connected weights/params represented as multiple tensors. so it may be difficult to broadcast tensor operations per iparam
messenger correctly. my original implementation of vec_random_module
stipulated that a user must pass in a "batch nn" (a nn with all batch operations), so the independence within a batch was guaranteed by the user.
Closely related, more specific use case: #1213.
Can we use our tensor DSL to support the use case of solving a batch of independent optimization problems? The proposed solution is to use broadcasting to optimize a batch of parameters independently on the same model and guide. This has been brought up in multiple contexts - batched second order optimizers (#1213), experimenting with population based algorithms such as genetic algorithms, and OED (@martinjankowiak).
One possible solution, as suggested by @eb8680 in #1213, would be to introduce an
iparam
messenger (which may additionally broadcastparam
statements by the size specified and reshape all sample statements appropriately).Things to think about:
iparam
block - it can be either innermost or outermost. If it is outermost, we will need a cap on enum dims (max_enum_dims
). Related - is there a use case to allow for nesting or should we restrict to one flatiparam
dim?iparam
would have to be external to anyiarange
or at least not interact withiarange
. This may be unavoidable as these additional dims may be bound to the optimization algorithm itself. e.g. if we want to report losses per iparam batch, or in case of NE algorithms, select the top k params in terms of loss as the pseudo offspring. One way to ensure this is to only allow them as messengers, and not as context managers in the wayiarange
s are defined.iparam
block will have one or more additional leftmost (or rightmost) dims corresponding to theiparam
. We may needpoutine.broadcast
to populate these additional dims as batch dims, or reshape them appropriately (say, if we want them to be leftmost) if they already depend on param statements.pyro.module
will have to be modified to support independent neural network weights / biases inside aniparam
(by default NNs will have all their input neurons connected to the subsequent layer which we don't want with separate param batches). Related: #503.Feel free to edit or add other solutions / use cases / design considerations.