wilsonmr / anvil

Repository containing code related to flow based generative model
https://wilsonmr.github.io/anvil/
GNU General Public License v3.0
0 stars 2 forks source link

Models restructure #46

Closed jmarshrossney closed 3 years ago

jmarshrossney commented 4 years ago

Quite a substantial overhaul of how models are implemented.

I've also moved the whole of core.py into checkpoint.py since it all concerns loading and saving models. The core module contains some other things which I think can be described as "core" functions.


This PR is the result of a long battle with reportengine which was ultimately won by reportengine.

Somewhere along the line, it was possible to completely specify the model in the runcard, with the format

model_spec:
  - layer_id: affine
    layer_spec:
        hidden_shape: [36]
        activation: leaky_relu
  - layer_id: affine
    layer_spec:
        hidden_shape: [24]
        activation: tanh
...

model_input:
  namespaces_: model_spec::layer_spec

which was lovely. It allowed chaining arbitrary layers with different parameters, and resolving each layer with a different namespace. We could also use the from_ key to avoid rewriting duplicate layers. The current method, where models are pre-defined, also works with this - i.e. we could write layer_id: real_nvp to save writing out affine_layer many time.

This works for training. Unfortunately, however, it breaks the sampling. I believe (based on the reportengine docs) that this is because you cannot use collect on an action which itself collects over an action using the namespaces_ key. Since we collect the sampling action using training_context, this doesn't work.

I've tagged the commit where this was working for training here, in case we can figure out a way to make it work.

Although not a priority, I feel like this is definitely something worth figuring out how to do. If we have a good system for specifying arbitrary chains of layers at runcard level, which still uses all of the best bits of reportengine, it's something we (or at least I) would definitely use extensively and reuse in future projects.