vislearn / FrEIA

Framework for Easily Invertible Architectures
MIT License
757 stars 112 forks source link

About implementing a reversible MLP Network #95

Open xuedue opened 3 years ago

xuedue commented 3 years ago

Thank you for sharing. In this framework, I have some questions.

Question1 Can this INN framework implements an MLP network with different input and output dimensions? For example, the input dimension is (batch size, 10) and the output dimension is (batch size, 2) .

Question2 Using the design of the reversible MLP in your demo, I found that when the input dimension becomes very large (thousands), the program will be stuck when running. How to solve this problem?

fdraxler commented 3 years ago

Hi, thanks for your questions.

1) INNs represent invertible functions, so the number of incoming and outgoing dimensions must be equal. If you are interested in prediction, you are probably looking for a conditional INN (cINN), where the input is passed as a condition to the network. For an example, see VLL-HD/conditional_INNs

2) Can you provide more details on what you mean by "be stuck"? Most likely, the computations become expensive and take their time. For debugging, could you please check whether the CPU usage is high (i.e. something is computed) and what the typical stack trace is when you do a keyboard interrupt?

xuedue commented 3 years ago

Hi, thanks for your questions.

  1. INNs represent invertible functions, so the number of incoming and outgoing dimensions must be equal. If you are interested in prediction, you are probably looking for a conditional INN (cINN), where the input is passed as a condition to the network. For an example, see VLL-HD/conditional_INNs
  2. Can you provide more details on what you mean by "be stuck"? Most likely, the computations become expensive and take their time. For debugging, could you please check whether the CPU usage is high (i.e. something is computed) and what the typical stack trace is when you do a keyboard interrupt?

Thanks for your reply, I have solve this problem as follows:

The network I want to implement is an MLP with 8 layers of FC, the code is as follows image image

I found that when I modify the permute_soft parameter to False, there is no problem at all.

I have two questions here.

  1. Here subnet_fc directly returns an 8-layer MLP whether it will be a problem, because your demo is set to several layers of fc, and then use the append function to add.
  2. Does permute_soft = False affect the generated result? What is the meaning of this parameter?
fdraxler commented 3 years ago

Great!

  1. The architecture of the subnetwork is a hyperparameter of the INN, just like overall structure of the INN.
  2. permute_soft is also a hyperparameter. For the RealNVP block, input vectors are rotated and then split. Then, only have of the split dimensions is actually modified by the RealNVP block, ensuring invertibility. permute_soft concerns the mode of rotation applied, either softly (arbitrary rotations) or not (dimensions are permuted). I am not aware of systematic ablations that directly compare permute_soft and not., but both variants exist in the literature.
xuedue commented 3 years ago

Great!

  1. The architecture of the subnetwork is a hyperparameter of the INN, just like overall structure of the INN.
  2. permute_soft is also a hyperparameter. For the RealNVP block, input vectors are rotated and then split. Then, only have of the split dimensions is actually modified by the RealNVP block, ensuring invertibility. permute_soft concerns the mode of rotation applied, either softly (arbitrary rotations) or not (dimensions are permuted). I am not aware of systematic ablations that directly compare permute_soft and not., but both variants exist in the literature.

Thank you for your reply.

I have another question to disturb you.

When I was training this reversible MLP network, I found that as the training progresses, the reversible structure of this reversible MLP is being destroyed. That is to say, as the training progresses, the gap between the input and the inverse of output gradually getting bigger.

Excuse me, why is this? Is there any solution?

Looking forward to your reply.

psorrenson commented 2 years ago

Hi, sorry for the late reply to your question. It looks like you are using a single AllInOneBlock with a large MLP as the subnet. In general, it is much more effective to have multiple blocks, each one using a smaller subnet. I'm not sure why your network is becoming less invertible as training progresses, but it may be due to numerical issues such as the outputs becoming extremely large or extremely small.