slimgroup / InvertibleNetworks.jl

A Julia framework for invertible neural networks
MIT License
151 stars 23 forks source link

Adding support + examples for FCN based INNs #76

Closed flo-he closed 2 months ago

flo-he commented 1 year ago

Hi there!

I'm trying to learn diffeomorphisms between manifolds and wanted to use INNs for this task. That is I want to map points of some manifold $x \in\mathcal{M}$ embedded in $\mathbb{R}^N$ to some other manifold $\mathcal{N} \subset \mathbb{R}^N$ via an INN.

I do not quite get how I might implement this task using your package, as all building blocks seem be based on convolutional architectures. For my problem, at last in my understanding, $x$ should be a (N, 1, 1, B) tensor, where B is the batch size. But then the problem arises, that for (all?) coupling layers, the number of channels has to be even (i.e. splittable).

I just wanted to ask what the plan is on adding support (+ MWE) for a simple (N, B) matrix layout as inputs to the INN and simple building blocks based on fully connected networks (FCN). I think this would be highly appreciated by the community as your package is currently the go-to for INNs in Julia!

mloubout commented 1 year ago

Hi Thanks for reaching out. So as you pointed out some architectures require power of 2 channels but it should be fairly straightforward to make the single channel case work. Can I ask which network you are interested in in so I can start with this one.

cheers

rafaelorozco commented 1 year ago

Hey thank you for the comment!

Im glad to hear that a FCN addition would be appreciated. I was about to start implementing it for our own use so I am glad to hear it would be useful to you.

I will implement FCN and add some examples in the coming days, but meanwhile if you want to play around you should be able to implement your scenario by putting the N dimension across channel like this:

N = 4 
batch_size = 2
x = randn(1,1,N,batch_size)
G = NetworkGlow(N, 32, 2, 5) 

z, logdet = G(x)
x_ = G.inverse(z)
println(x_ - x) # should be numerical zero

G.backward(z/batch_size,z) # sets gradients if mapping to gaussian normal

That should work well, we have had success by working on vectors in that fashion. Although I am very curious about the increase in performance that we would get by doing it with truly FCN. Will keep you updated on my observations soon!

If N is odd then you can hack it by stacking the vector on itself to get it to even:

N = 5 
batch_size = 2
x = randn(1,1,N,batch_size)
x_stack = cat(x,x;dims=3)
G = NetworkGlow(2*N, 32, 2, 5) 

The evenness is not a FCN problem that is just a code problem that I will fix in a bit.

As @mloubout mentions this also depends on if you want unconditional network as the example I have shown or if you are doing a conditional network. It is actually easier for the conditional case because you can mismatch channel dims between x and the condition.

flo-he commented 1 year ago

Thank you both for the quick yet thorough answers!

Thanks for the code snippets @rafaelorozco! I tried to use the channel dim too, but then stumbled on the even channel issue and did not know enough about the mechanisms to know if simply duplicating the odd input dimensions would not brake the invertibility or the general gist of INNs. Good to know, that this is more a code design thingy, I can play around with this solution for now!

I'm more interested in the unconditional case. So I will just use your little hack for now, even though there is a performance penalty for doubling the input dimensions I guess.

@mloubout I do not care about the exact architecture right now - the model just needs to be a trainable nonlinear bijection. Everything else is just a matter of which architecture is the most SoTA & performant one.

Thanks again & good to know that this is work in progress!

rafaelorozco commented 1 year ago

Hello @flo-he We have a branch started with our main test cases passing (invertibility, gradients etc) it would be nice if you played around with it and let us know if you run into errors or have features that would be good to add before we merge.

https://github.com/slimgroup/InvertibleNetworks.jl/pull/77

The main added functionality is a glow coupling layer that has a dense layer as the neural net backbone working for inputs that look like this: (nx,n_in,n_batch)

You can find an example on how to build a dense Glow network here https://github.com/slimgroup/InvertibleNetworks.jl/blob/dense/examples/networks/network_glow_dense.jl

See this line to see how you could change the dense layer used: https://github.com/slimgroup/InvertibleNetworks.jl/blob/dense/src/layers/invertible_layer_glow.jl#L93

Some things that need to be taken care off: -n_in can be 1 but you need to turn on split_scales=true (basically realnvp multiscale transformations) to increase channel size. I made this the default behaviour. -network needs to be set for 1dim inputs GlowNetwork(....;dense=true,nx=nx,....,ndims=1)

I tried the network on our sanity-check rosenbrock 2dim distribution. It seems to be on par with convolutional network but of course the dense network should shine for larger dimensions. Screen Shot 2023-04-01 at 1 25 59 PM Screen Shot 2023-04-01 at 1 25 28 PM

flo-he commented 1 year ago

Hi @rafaelorozco,

sorry for the delayed response - was on vacation for the last two weeks.

Thanks for the effort, I will play around with it in the coming week(s). And let you know if I encounter any strange behavior.

rafaelorozco commented 2 months ago

Marking as closed since I merged this to the main branch but please reopen if you want to discuss some more.