noowad93 / MIMO-pytorch

PyTorch Implementation of MIMO (ICLR 2021)
16 stars 5 forks source link

Incorrect implementation #1

Closed martinferianc closed 2 years ago

martinferianc commented 2 years ago

Hi, thank you for this work. Unfortunately, I do not think that your implementation is correct. The features in the network do not seem to be shared across the network. The input is processed practically individually per each sample, while being processed together only in the last layer. I am referencing: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py, more specifically:

def create_mimo(architecture, data_dim=1, ens_size=1, activation='relu'):
  """Create a MIMO model by expanding input/ouput layer by ensemble size."""
  # The only modification needed by MIMO: expand input/output layer by ensemble size
  num_logits = 1  # Since this is a regression problem.
  inputs_size = data_dim * ens_size # THIS LINE
  outputs_size = num_logits * ens_size

  # MIMO input: expand input layer by ensemble size.
  inputs = tf.keras.layers.Input(shape=(inputs_size,))  # THIS LINE

  # Use the classic MLP encoder.
  net = inputs
  net = tf.keras.layers.Flatten()(net)
  for units in architecture:
    net = tf.keras.layers.Dense(units, activation=activation)(net)

  # MIMO output: Expand output layer by ensemble size.
  outputs = tf.keras.layers.Dense(outputs_size, activation='linear')(net)
  mimo_mlp = tf.keras.models.Model(inputs=inputs, outputs=outputs)
  return mimo_mlp

Thank you for looking into this

noowad93 commented 2 years ago

thank you for letting me know. I will check it soon.

noowad93 commented 2 years ago

i've checked my implementation and I've added some comment on source code. check it: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py

I think mine also follows the same logic as you wrote.

martinferianc commented 2 years ago

Thank you for looking into this so fast! I think that the problem is in the fact that the number of ensemble numbers is folded not in the feature dimensionality, for convolution this would be the number of channels, but into the batch size: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L20. As a result, the model always processes each input separately through the network: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L37 and only the output for the last layer is then combined together: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L26. What do you think?

noowad93 commented 2 years ago

I thought that the features from the other inputs should be independent. This is a quote from the original paper.

Selecting independent examples for the multi-input configuration during training forces the subnetworks not to share any features. 
martinferianc commented 2 years ago

No, I do not think so, the input samples should be independent e.g. look at the model.summary() in the official implementation: https://colab.research.google.com/drive/1JIgyVeEmlOH-j0oGmeDLV8iE5UYdreYm#scrollTo=Ac1oS_S8kQOB

Edit: I gave a shot at implementation and here is my implementation in PyTorch with respect to the notebook above: https://colab.research.google.com/drive/16i8Wd8hYYgZfVLs6MPVFZ2faccpnYkyk?usp=sharing

noowad93 commented 2 years ago

I understood my error. I'll fix it soon. thank you for the report!

noowad93 commented 2 years ago

mine was just multi-output, not multi-input and multi-ouput.

martinferianc commented 2 years ago

No problem at all I just wanted to help :) !

martinferianc commented 2 years ago

Sorry I forgot to update that colab notebook yesterday, now it is all PyTorch!

ZubeyirOflaz commented 2 years ago

Hi,

Sorry for reincarnating a closed issue, but if I am not mistaken, this is still not the correct implementation of the MIMO model. For the MIMO model, the hidden layers of the neural network is supposed to be shared by the ensemble members, but if my understanding of this project is correct each ensemble member has its' own separate hidden layer. Am I incorrect @martinferianc ?

If I am correct the solution of this is quite simple, and I can provide the solution with a PR if you would like

martinferianc commented 2 years ago

Hi ZubeyirOflaz, I do think that it is correct now, which line makes you believe that there are separate hidden layers for different members?

ZubeyirOflaz commented 2 years ago

There are four reasons for this. First, the diagram of the network shared on the second page of the paper says "The hidden layers remain unchanged. The black connections are shared by all subnetworks, while the colored connections are for individual subnetworks." black connections on that page shows the hidden layers. The second reason is that MIMO is said to take advantage of the sparsity of neural networks by housing multiple neural networks in the same model. The same page I mentioned says "We show that, using a multi-input multioutput (MIMO) configuration, we can concurrently train multiple independent subnetworks within one network. These subnetworks co-habit the network without explicit separation." When we create separate dimensions for hidden layers we are creating explicit separation, and I cannot see how the sparsity of the neural network would be affected if we have different dimensions for each ensemble member. The third reason is I used MIMO in my bachelor thesis recently, and my supervisor was quite certain that the hidden layers are shared by all ensemble members. He said there is an earlier academic paper that implements different hidden layers for each ensemble member, but hidden layers in the MIMO network are shared for all ensemble members. Lastly, after my conversation with my supervisor, I went back and read the paper and I couldn't see any section that supported the idea that hidden layers are supposed to be separate for each ensemble member as well.

martinferianc commented 2 years ago

Hi ZubeyirOflaz thanks for the arguments and which line in the code makes you believe that the hidden layers are not shared, if you look at: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L10 the separation has been fixed I think. @noowad93

ZubeyirOflaz commented 2 years ago

I apologize, I now realize I understood your original question wrong

In line 29 "input_tensor = self.input_layer(input_tensor) # (batch_size, ensemble_num, hidden_dim * ensemble_num)" Is PyTorch not creating hidden layers for each ensemble member separately? In line 38 of https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/trainer.py can we not concatenate inputs rather than stacking them and then use a standard 2D linear layer?

Maybe my understanding of how nn.Linear handles multi-dimensional input is incorrect. I looked through the PyTorch documentation and can't find the answer one way or the other

martinferianc commented 2 years ago

So the concatenation if I am not mistaken happens in the training loop and then the comment is confusing I agree, I think that @noowad93 simply forgot to remove it. I think look at: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L35 and you can see that clearly the input of the backbone is all the inputs concatenated and processed by a single 1D linear layer.

ZubeyirOflaz commented 2 years ago

During the training, the inputs are stacked to input with (bacth_size, ensemble_number, data) dimensions, not concatenated. I looked at the input dimensions (with batch_size of 8 and ensemble_number of 3) during the neural network forward and these were the tensor dimensions in each stage:

Input tensor after images flattened (8,3,784) Input tensor after input layer is applied (8,3,2352) Output tensor after backbone_model is applied (8,3,128) In all the linear layer stages the ensemble_num dimension is preserved.

If the inputs are concatenated instead of stacked we can easily get rid of the additional dimension

martinferianc commented 2 years ago

Ok then you might be actually right, I used this code as my starting point but then I completely changed it so I do not remember and @noowad93 will of course know the best.

ZubeyirOflaz commented 2 years ago

Thank you for your input @martinferianc, what do you think @noowad93?