Quite a substantial overhaul of how models are implemented.
Every 'layer' in the model has a forward method which takes the current state of the model and the 'current' value for the logarithm of the probably density.
This is mainly achieved by subclassing nn.Sequential so that it takes two inputs.
This means we can chain layers in a PyTorch-esque way where the output of one layer can be directly used as the input of the next. To define a model, one can just write an action which returns a Sequential object containing arbitrary layers.
Probability densities from multiple flows can be combined as a 'convex combination', which is just a weighted sum with normalised weights. This is a way of increasing expressivity of a flow where the individual transformations form a group under function composition. In particular, it seems to be crucial for increasing expressivity in the projection models.
Neural networks are instantiated within the transformation layer Modules. I think this makes sense because the networks really belong to the coupling layers and we can easily set defaults and layer-specific parameters inside the layer Modules. If we want more flexibility we can add parameters to the signature of the model action - e.g. adding t_hidden_shape=None to the signature of real_nvp (and AffineLayer) which, if defined in the runcard, would allow a different hidden shape for the s and t networks.
I've also moved the whole of core.py into checkpoint.py since it all concerns loading and saving models.
The core module contains some other things which I think can be described as "core" functions.
This PR is the result of a long battle with reportengine which was ultimately won by reportengine.
Somewhere along the line, it was possible to completely specify the model in the runcard, with the format
which was lovely. It allowed chaining arbitrary layers with different parameters, and resolving each layer with a different namespace.
We could also use the from_ key to avoid rewriting duplicate layers. The current method, where models are pre-defined, also works with this - i.e. we could write layer_id: real_nvp to save writing out affine_layer many time.
This works for training. Unfortunately, however, it breaks the sampling. I believe (based on the reportengine docs) that this is because you cannot use collect on an action which itself collects over an action using the namespaces_ key. Since we collect the sampling action using training_context, this doesn't work.
I've tagged the commit where this was working for training here, in case we can figure out a way to make it work.
Although not a priority, I feel like this is definitely something worth figuring out how to do. If we have a good system for specifying arbitrary chains of layers at runcard level, which still uses all of the best bits of reportengine, it's something we (or at least I) would definitely use extensively and reuse in future projects.
Quite a substantial overhaul of how models are implemented.
forward
method which takes the current state of the model and the 'current' value for the logarithm of the probably density.t_hidden_shape=None
to the signature ofreal_nvp
(andAffineLayer
) which, if defined in the runcard, would allow a different hidden shape for the s and t networks.I've also moved the whole of
core.py
intocheckpoint.py
since it all concerns loading and saving models. The core module contains some other things which I think can be described as "core" functions.This PR is the result of a long battle with reportengine which was ultimately won by reportengine.
Somewhere along the line, it was possible to completely specify the model in the runcard, with the format
which was lovely. It allowed chaining arbitrary layers with different parameters, and resolving each layer with a different namespace. We could also use the
from_
key to avoid rewriting duplicate layers. The current method, where models are pre-defined, also works with this - i.e. we could writelayer_id: real_nvp
to save writing out affine_layer many time.This works for training. Unfortunately, however, it breaks the sampling. I believe (based on the reportengine docs) that this is because you cannot use
collect
on an action which itself collects over an action using thenamespaces_
key. Since we collect the sampling action usingtraining_context
, this doesn't work.I've tagged the commit where this was working for training here, in case we can figure out a way to make it work.
Although not a priority, I feel like this is definitely something worth figuring out how to do. If we have a good system for specifying arbitrary chains of layers at runcard level, which still uses all of the best bits of reportengine, it's something we (or at least I) would definitely use extensively and reuse in future projects.