Closed martinferianc closed 2 years ago
thank you for letting me know. I will check it soon.
i've checked my implementation and I've added some comment on source code. check it: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py
I think mine also follows the same logic as you wrote.
Thank you for looking into this so fast! I think that the problem is in the fact that the number of ensemble numbers is folded not in the feature dimensionality, for convolution this would be the number of channels, but into the batch size: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L20. As a result, the model always processes each input separately through the network: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L37 and only the output for the last layer is then combined together: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L26. What do you think?
I thought that the features from the other inputs should be independent. This is a quote from the original paper.
Selecting independent examples for the multi-input configuration during training forces the subnetworks not to share any features.
No, I do not think so, the input samples should be independent e.g. look at the model.summary()
in the official implementation: https://colab.research.google.com/drive/1JIgyVeEmlOH-j0oGmeDLV8iE5UYdreYm#scrollTo=Ac1oS_S8kQOB
Edit: I gave a shot at implementation and here is my implementation in PyTorch with respect to the notebook above: https://colab.research.google.com/drive/16i8Wd8hYYgZfVLs6MPVFZ2faccpnYkyk?usp=sharing
I understood my error. I'll fix it soon. thank you for the report!
mine was just multi-output, not multi-input and multi-ouput.
No problem at all I just wanted to help :) !
Sorry I forgot to update that colab notebook yesterday, now it is all PyTorch!
Hi,
Sorry for reincarnating a closed issue, but if I am not mistaken, this is still not the correct implementation of the MIMO model. For the MIMO model, the hidden layers of the neural network is supposed to be shared by the ensemble members, but if my understanding of this project is correct each ensemble member has its' own separate hidden layer. Am I incorrect @martinferianc ?
If I am correct the solution of this is quite simple, and I can provide the solution with a PR if you would like
Hi ZubeyirOflaz, I do think that it is correct now, which line makes you believe that there are separate hidden layers for different members?
There are four reasons for this. First, the diagram of the network shared on the second page of the paper says "The hidden layers remain unchanged. The black connections are shared by all subnetworks, while the colored connections are for individual subnetworks." black connections on that page shows the hidden layers. The second reason is that MIMO is said to take advantage of the sparsity of neural networks by housing multiple neural networks in the same model. The same page I mentioned says "We show that, using a multi-input multioutput (MIMO) configuration, we can concurrently train multiple independent subnetworks within one network. These subnetworks co-habit the network without explicit separation." When we create separate dimensions for hidden layers we are creating explicit separation, and I cannot see how the sparsity of the neural network would be affected if we have different dimensions for each ensemble member. The third reason is I used MIMO in my bachelor thesis recently, and my supervisor was quite certain that the hidden layers are shared by all ensemble members. He said there is an earlier academic paper that implements different hidden layers for each ensemble member, but hidden layers in the MIMO network are shared for all ensemble members. Lastly, after my conversation with my supervisor, I went back and read the paper and I couldn't see any section that supported the idea that hidden layers are supposed to be separate for each ensemble member as well.
Hi ZubeyirOflaz thanks for the arguments and which line in the code makes you believe that the hidden layers are not shared, if you look at: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L10 the separation has been fixed I think. @noowad93
I apologize, I now realize I understood your original question wrong
In line 29 "input_tensor = self.input_layer(input_tensor) # (batch_size, ensemble_num, hidden_dim * ensemble_num)" Is PyTorch not creating hidden layers for each ensemble member separately? In line 38 of https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/trainer.py can we not concatenate inputs rather than stacking them and then use a standard 2D linear layer?
Maybe my understanding of how nn.Linear handles multi-dimensional input is incorrect. I looked through the PyTorch documentation and can't find the answer one way or the other
So the concatenation if I am not mistaken happens in the training loop and then the comment is confusing I agree, I think that @noowad93 simply forgot to remove it. I think look at: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py#L35 and you can see that clearly the input of the backbone is all the inputs concatenated and processed by a single 1D linear layer.
During the training, the inputs are stacked to input with (bacth_size, ensemble_number, data) dimensions, not concatenated. I looked at the input dimensions (with batch_size of 8 and ensemble_number of 3) during the neural network forward and these were the tensor dimensions in each stage:
Input tensor after images flattened (8,3,784) Input tensor after input layer is applied (8,3,2352) Output tensor after backbone_model is applied (8,3,128) In all the linear layer stages the ensemble_num dimension is preserved.
If the inputs are concatenated instead of stacked we can easily get rid of the additional dimension
Ok then you might be actually right, I used this code as my starting point but then I completely changed it so I do not remember and @noowad93 will of course know the best.
Thank you for your input @martinferianc, what do you think @noowad93?
Hi, thank you for this work. Unfortunately, I do not think that your implementation is correct. The features in the network do not seem to be shared across the network. The input is processed practically individually per each sample, while being processed together only in the last layer. I am referencing: https://github.com/noowad93/MIMO-pytorch/blob/master/mimo/model.py, more specifically:
Thank you for looking into this