slimgroup / InvertibleNetworks.jl

A Julia framework for invertible neural networks
MIT License
149 stars 21 forks source link

HINT layer input must be power of 2 in size? #46

Closed kramsretlow closed 4 months ago

kramsretlow commented 2 years ago

Hi, thanks for your work on InvertibleNetworks.jl. I'm trying to use an "unconditional" HINT-type invertible network for a normalizing flow, and I find that I get an error if my data dimensionality is not a power of 2. Is this by design?

The offending line is line 86 in invertible_layer_hint.jl, where the block size for the jth step in the recursive coupling is set to Int(n_in/2^j). I guess the block size sequence could be done differently to handle n_in not a power of 2?

rafaelorozco commented 2 years ago

Hello! Thank you for your interest in the package!

Just to be clear on our terms, n_in is the number of channels. And yes, currently this behavior (requiring even number of channels at each recursion step) is by design.

Splitting an input tensor in half along the channel dimension is the most straightforward way of implementing a coupling layer. Although, this isnt necessary, one could split the tensor in any way and the layer would still be invertible. Splitting in half is chosen simply because it transforms most of the tensor.

May I ask what your application was? I was also thinking of implementing non-even splits as it would certainly be useful in some applications.

kramsretlow commented 2 years ago

Thanks for your response, I have to admit I'm not so clear on the meaning of a "channel" here, from this and the mention of convolutions in the docstrings I guess the main applications you have been working with is image data? Sorry if I'm missing something important. I'm not so well-versed in deep learning terminology, I'm only a statistician 😃

In my case, I have thousands of realizations of some functional data, each observation is a vector of length 129. I am exploring the approach of viewing each observation as a realization from a 129-dimensional random vector, and using conditioning to predict the unobserved portion of a partially-observed future instance. I want to use a flow to transform the observations to something closer to multivariate normal, so I can do the conditioning easily. So it's a pretty "vanilla" normalizing flow application--I want to find a transformation to multivariate normality.

I understand that splitting in half is natural, but I was looking at the Kruse et al HINT paper and it seems like splitting at something like floor(this_size/2) in the recursion would do the job without requiring the original size to be a power of 2.

I'm also trying to do the same thing with Bijectors.jl and so far still figuring out how to get the training working. In that package they have a "planar layer" and "radial layer" and you can set the dimension freely.

Thanks for any comments you have, and again, sorry if I'm misunderstanding something. I haven't read that deeply no normalizing flows/invertible networks but I think the area has a lot of potential, it's pretty cool.

mloubout commented 2 years ago

I'm not so clear on the meaning of a "channel" here

In general, the input of a network will be a multi-dimensional tensor of size N... x C x B where B is the batchsize (i.e number of input processed at once), N... is the size of the input, which would be just129 and C is the number of channels. One straightforward example of "channels" is RGB image that have 3 channels, one for each of R,G,B. In your case, if you only have a straightforward time-series, your "channel" size would be 1. That dimension however needs to be explicit so you need to reshape your input reshape(d, (129, 1)) to make that channel size explicit.

Since it's a unique channel, you would need to either add a second channel in some way or use a network that doesn't rely on splitting the channel.

kramsretlow commented 2 years ago

Thanks for that, I understand. Time permitting maybe I'll give it another go and see if I can get something working. Thanks for your work on the package 👍