mlpack / models

models built with mlpack
https://models.mlpack.org/docs
BSD 3-Clause "New" or "Revised" License
35 stars 40 forks source link

[WIP] Resnet Module #61

Closed Aakash-kaushik closed 3 years ago

Aakash-kaushik commented 3 years ago

This PR aims to implement resnet module which would be able to create all the resnet variants from the paper and this aims to follow the same architecture as PyTorch for some reasons.

  1. We can't train so many models on imagenet right now.
  2. We don't know if they will converge.
  3. keeping the same architecture that allows us to get weights from PyTorch.

Things i have some doubts about:

  1. How would the residual block be implemented from the sequential layer?
  2. Can I get the output of a layer at a random stage and add it to another layer? (skip connections)

Resources:

  1. Pytorch's resnet implementation.
  2. Resnet paper.
zoq commented 3 years ago

The idea of the sequential layer is that it wraps arbitrary layers and exposes them as if it would be a single layer. The sequential layer has a template parameter (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L70), which tells the layer to add the input to the output of the last layer. There is also a convenient typedef (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L260-L261) that already sets the template parameter for you. Below is an elementary example:

 Residual<>* residual = new Residual<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);

residual->Add(linearA);
residual->Add(linearB);

in this case linearA and linearB are run and the input is also added to the output of the last layer, which in this case is linearB.

There is also a test case - https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/tests/ann_layer_test.cpp#L3325

Aakash-kaushik commented 3 years ago

The idea of the sequential layer is that it wraps arbitrary layers and exposes them as if it would be a single layer. The sequential layer has a template parameter (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L70), which tells the layer to add the input to the output of the last layer. There is also a convenient typedef (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L260-L261) that already sets the template parameter for you. Below is an elementary example:

 Residual<>* residual = new Residual<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);

residual->Add(linearA);
residual->Add(linearB);

in this case linearA and linearB are run and the input is also added to the output of the last layer, which in this case is linearB.

There is also a test case - https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/tests/ann_layer_test.cpp#L3325

Hi thanks @zoq for this but the part that confused me was that it checks in the code that if the dimensions of the first layer are equal to the last layer or not and for ResNet there would be a case where the input dim of the first layer will be diff from the last one and so I need a conv 1*1 block just for the first layer input that is not run like all layers but separately before adding it to the last layer.

how do you suggest i accomplish that ?

cc: @kartikdutt18

zoq commented 3 years ago

In this case you can use a combination of AddMerge and Sequential. The AddMerge layer just takes arbitrary runs each layer and at the end adds them together.

AddMerge<> resblock(false, false);

Sequential<>* sequential = new Sequential<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);

 sequential->Add(linearA);
 sequential->Add(linearB);

Convolution<> conv = new Convolution<>(...);

resblock.Add(sequential);
resblock.Add(conv);

Let me know if this is what you looked for. Maybe it makes sense to implement that structure as an independent layer in mlpack.

Aakash-kaushik commented 3 years ago

In this case you can use a combination of AddMerge and Sequential. The AddMerge layer just takes arbitrary runs each layer and at the end adds them together.

AddMerge<> resblock(false, false);

Sequential<>* sequential = new Sequential<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);

 sequential->Add(linearA);
 sequential->Add(linearB);

Convolution<> conv = new Convolution<>(...);

resblock.Add(sequential);
resblock.Add(conv);

Let me know if this is what you looked for. Maybe it makes sense to implement that structure as an independent layer in mlpack.

I am still stuck at this and don't exactly know what to do, it would be easy if we somehow had a way to define the flow for the network but that is not how it is designed and the main problem here is the downsampling block, I can put all of the things inside a residual block and say it saves the input of the first layer into a temp variable and then tries to add it to the last layer in the residual block but when it does that it finds out that the shapes don't match and I don't see how I can use AddMerge to achieve the same flow. but do let me know if you see some other way around it. I have been thinking around it for way too long.

kartikdutt18 commented 3 years ago

I will try to think of a solution for this and get back to you.

zoq commented 3 years ago

Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?

Something like:

Unbenannte Zeichnung

Aakash-kaushik commented 3 years ago

Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?

Something like:

Unbenannte Zeichnung

Yes i believe this is exactly what i am trying to do. this was a great diagram for that. Thank you so much.

Aakash-kaushik commented 3 years ago

Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?

Something like:

Unbenannte Zeichnung

Also @zoq if you have time, can you explain how the diagram you made would work in code and also explain a little bit about the code that you wrote with addmerge and residual layer together.

zoq commented 3 years ago

Sure, if you have a specific example you like me to show here, let me know if not I just keep it general.

Aakash-kaushik commented 3 years ago

Sure, if you have a specific example you like me to show here, let me know if not I just keep it general.

Yup let me reference a code block from torchvision that i would like to have, that should give a better idea. https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L59-L83

Aakash-kaushik commented 3 years ago

Sure, if you have a specific example you like me to show here, let me know if not I just keep it general.

Yup let me reference a code block from torchvision that i would like to have, that should give a better idea. https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L59-L83

btw this was just to give you an idea on what i was thinking upon any example that is similar or close should work for me.

kartikdutt18 commented 3 years ago

Sequential<>* sequential = new Sequential<>(true);

Convolution<> convA = new Convolution<>(...);
Convolution<> convA = new Convolution<>(...);

sequential->Add(convA);
sequential->Add(convA);

Sequential<>* downsample_layer = this->Downsample(...)

resblock.Add(sequential);
resblock.Add(downsample_layer);
Aakash-kaushik commented 3 years ago

Hey @zoq don't post the code example anymore, i had a meet with kartik and we figured out the residual block so i can implement it now. Will also let you know about other things we talked about im our meeting tomorrow, might be interesting.

zoq commented 3 years ago

Sounds great.

zoq commented 3 years ago

Thanks for adding the code, this makes it super easy to understand in what direction this is moving forward.

Aakash-kaushik commented 3 years ago

Thanks for adding the code, this makes it super easy to understand in what direction this is moving forward.

Yes I will keep the PR open as our discussion forum and when I am done I can create separate PR that reviews the code and functionality this is more to just have a daily update on what I worked on that day.

Aakash-kaushik commented 3 years ago

Just pushed bottleneck block and wanted to confirm two things:

  1. Should the addMerge block should be left with it's default setting for our purposes?
  2. the identity layer only acts as a layer that that passes the input and gradients without any modifications, right ? I mean it is evident but i still wanted to confirm it once.
zoq commented 3 years ago

Just pushed bottleneck block and wanted to confirm two things:

  1. Should the addMerge block should be left with it's default setting for our purposes?

With default parameter you mean model and run? If that is the case, you have to set model = true and run = true, you only have to set model=false if the layers you passed are initialized as part of another layer, e.g. if you pass the same layer to another AddMerge layer, in this case you don't want to initialize the layer twice.

  1. the identity layer only acts as a layer that that passes the input and gradients without any modifications, right ? I mean it is evident but i still wanted to confirm it once.

That is correct.

Aakash-kaushik commented 3 years ago

Just pushed bottleneck block and wanted to confirm two things:

  1. Should the addMerge block should be left with it's default setting for our purposes?

With default parameter you mean model and run? If that is the case, you have to set model = true and run = true, you only have to set model=false if the layers you passed are initialized as part of another layer, e.g. if you pass the same layer to another AddMerge layer, in this case you don't want to initialize the layer twice.

  1. the identity layer only acts as a layer that that passes the input and gradients without any modifications, right ? I mean it is evident but i still wanted to confirm it once.

That is correct.

thanks

Aakash-kaushik commented 3 years ago

Work left:

  1. Figure seg fault (basically make resnet 18 work)
  2. Write bottlenect block (skipped it because wasn't needed for resnet 18 or 34 and i want to make these work first)
  3. just write some if else for all resnets, that is basically the number of blocks and nothing else
  4. Port weights from pytorch to mlpack.

With these 4 things the ResNet module should be done. Just writing these out because for some reason i tend to miss things in the meet.

Aakash-kaushik commented 3 years ago

For some reason the input width and height keep on increasing while i go down in the architecture. adding the output for this and also updating the code.

the last two figure in layers are the inputWidth and inputHeight respectively.

Convolution: 3 64 7 7 2 2 3 3 224 224
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 1 1 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 58 58
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 1 1 1 1 1 1 58 58
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 60 60
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 128 1 1 1 1 1 1 60 60
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 62 62
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 1 1 0 0 62 62
BatchNorm: 128
Relu
Convolution: 128 128 1 1 1 1 1 1 62 62
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 64 64
BatchNorm: 128
IdentityLayer
Relu
Convolution: 128 256 1 1 1 1 1 1 64 64
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 66 66
BatchNorm: 256
DownSample below
Convolution: 128 256 1 1 1 1 0 0 66 66
BatchNorm: 256
Relu
Convolution: 256 256 1 1 1 1 1 1 66 66
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 68 68
BatchNorm: 256
IdentityLayer
Relu
Convolution: 256 512 1 1 1 1 1 1 68 68
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 70 70
BatchNorm: 512
DownSample below
Convolution: 256 512 1 1 1 1 0 0 70 70
BatchNorm: 512
Relu
Convolution: 512 512 1 1 1 1 1 1 70 70
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 72 72
BatchNorm: 512
IdentityLayer
Relu
AdaptiveMeanPooling: 1,1
Linear: 512 1000
Segmentation fault (core dumped)
kartikdutt18 commented 3 years ago

Have you compared it with PyTorch?

Aakash-kaushik commented 3 years ago

Have you compared it with PyTorch?

I did, btw Pytorch doesn't shows the input dimensions so did that with a tf implementation with same input size and as it is obvious the dimensions which are width and height should decrease while the channels increases as we go.

Aakash-kaushik commented 3 years ago

For some reason the input width and height keep on increasing while i go down in the architecture. adding the output for this and also updating the code.

the last two figure in layers are the inputWidth and inputHeight respectively.

Convolution: 3 64 7 7 2 2 3 3 224 224
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 1 1 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 58 58
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 1 1 1 1 1 1 58 58
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 60 60
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 128 1 1 1 1 1 1 60 60
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 62 62
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 1 1 0 0 62 62
BatchNorm: 128
Relu
Convolution: 128 128 1 1 1 1 1 1 62 62
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 64 64
BatchNorm: 128
IdentityLayer
Relu
Convolution: 128 256 1 1 1 1 1 1 64 64
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 66 66
BatchNorm: 256
DownSample below
Convolution: 128 256 1 1 1 1 0 0 66 66
BatchNorm: 256
Relu
Convolution: 256 256 1 1 1 1 1 1 66 66
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 68 68
BatchNorm: 256
IdentityLayer
Relu
Convolution: 256 512 1 1 1 1 1 1 68 68
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 70 70
BatchNorm: 512
DownSample below
Convolution: 256 512 1 1 1 1 0 0 70 70
BatchNorm: 512
Relu
Convolution: 512 512 1 1 1 1 1 1 70 70
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 72 72
BatchNorm: 512
IdentityLayer
Relu
AdaptiveMeanPooling: 1,1
Linear: 512 1000
Segmentation fault (core dumped)

Hey @kartikdutt18, @zoq the part where the inputWidth and outputWidth is updated inside the conv3x3 and conv1x1 functions that isin't actually updating the variables, do you guys know why would that be ?

zoq commented 3 years ago

You mean https://github.com/mlpack/models/pull/61/files#diff-a516cb18be832520af513b17fdd1b96f5e140f661a1f5c34a69f3e4e5a3c19b3R90-R92 is not updating?

Aakash-kaushik commented 3 years ago

You mean https://github.com/mlpack/models/pull/61/files#diff-a516cb18be832520af513b17fdd1b96f5e140f661a1f5c34a69f3e4e5a3c19b3R90-R92 is not updating?

Yes, I can paste a output and you will see that before and after the operation the dimensions are still the same. I am on windows right now, but will paste an output in sometime.

Aakash-kaushik commented 3 years ago

You mean https://github.com/mlpack/models/pull/61/files#diff-a516cb18be832520af513b17fdd1b96f5e140f661a1f5c34a69f3e4e5a3c19b3R90-R92 is not updating?

Yes, I can paste a output and you will see that before and after the operation the dimensions are still the same. I am on windows right now, but will paste an output in sometime.

Convolution: 3 64 7 7 2 2 3 3 224 224
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 1 1 0 0 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
IdentityLayer
Relu
Convolution: 128 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
DownSample below
Convolution: 128 256 1 1 1 1 0 0 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
IdentityLayer
Relu
Convolution: 256 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
DownSample below
Convolution: 256 512 1 1 1 1 0 0 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
IdentityLayer
Relu
AdaptiveMeanPooling: 1,1
Linear: 512 1000

that 56 56 you see after layer is just printed after the updation of inputWidth and inputheight but as you can see it's still the same.

Aakash-kaushik commented 3 years ago

You mean https://github.com/mlpack/models/pull/61/files#diff-a516cb18be832520af513b17fdd1b96f5e140f661a1f5c34a69f3e4e5a3c19b3R90-R92 is not updating?

Yes, I can paste a output and you will see that before and after the operation the dimensions are still the same. I am on windows right now, but will paste an output in sometime.

Convolution: 3 64 7 7 2 2 3 3 224 224
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 1 1 0 0 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
IdentityLayer
Relu
Convolution: 128 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
DownSample below
Convolution: 128 256 1 1 1 1 0 0 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
IdentityLayer
Relu
Convolution: 256 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
DownSample below
Convolution: 256 512 1 1 1 1 0 0 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
IdentityLayer
Relu
AdaptiveMeanPooling: 1,1
Linear: 512 1000

that 56 56 you see after layer is just printed after the updation of inputWidth and inputheight but as you can see it's still the same.

Seems like i am wrong and it is updating the params but they are just coming up to be 56 again. Will look into it.

zoq commented 3 years ago

If you debug something like that, I would recommend to start with something smaller, like in this case it's the first Conv layer, so you don't need the other layers.

Aakash-kaushik commented 3 years ago

If you debug something like that, I would recommend to start with something smaller, like in this case it's the first Conv layer, so you don't need the other layers.

Hi, thanks for this the reason why i wasn't doing this was because i thought the predict function or something at the end expects a certain way the matrix should be but i see we don't use a two dimensional array similar to other libraries to represent width and dimensions. Because they are 4 dimensional (channel, width, height, batch) but i read further mlpack code and saw that we have (channelwidthheight, batch)

Aakash-kaushik commented 3 years ago

If you debug something like that, I would recommend to start with something smaller, like in this case it's the first Conv layer, so you don't need the other layers.

So I did this and as soon as I wrap even a single layer into the sequential layer and add that to the network I get a segmentation error and I have taken care of the point where it should be preceded with an identity layer if it is used as the first layer which it isn't and the second point where it needs at least two layers to call its gradient function I have added two-layer which work when independently added to the network but not when added in the sequential block and also I don't think the predict function calls the gradient function.

Aakash-kaushik commented 3 years ago

The last commit id for my mlpack version: 58869b57c50926c4e9682c35c6c6efa8ea31d604 https://github.com/mlpack/mlpack/commit/58869b57c50926c4e9682c35c6c6efa8ea31d604

I believe it's the latest one that was merged 3 days ago.

zoq commented 3 years ago

The last commit id for my mlpack version: 58869b57c50926c4e9682c35c6c6efa8ea31d604 mlpack/mlpack@58869b5

I believe it's the latest one that was merged 3 days ago.

Thanks with that I'm able to reproduce the issue on my system.

Aakash-kaushik commented 3 years ago

The last commit id for my mlpack version: 58869b57c50926c4e9682c35c6c6efa8ea31d604 mlpack/mlpack@58869b5 I believe it's the latest one that was merged 3 days ago.

Thanks with that I'm able to reproduce the issue on my system.

That's great, I did try a couple more things yesterday but to no avail.

Some of them were:

  1. Adding a identity layer at the start.
  2. Wrapping the first layers in a sequential block So the network starts with a sequential block.

Things i will try:

  1. Just going to run a simple sequential block now in a ffn class.

Do let me know if you find something too. ✌️

Aakash-kaushik commented 3 years ago

Hey @zoq, @kartikdutt18 i figured it out. I feel happy and dumb at the same time. So in the get model function it was just returning the ffn object rather than the reference of it.

Aakash-kaushik commented 3 years ago

Had this

ann::FFN<OutputLayerType, InitializationRuleType> GetModel()
      { return resNet; }

When it should have been this:

ann::FFN<OutputLayerType, InitializationRuleType>& GetModel()
      { return resNet; }
Aakash-kaushik commented 3 years ago

So i am using this padding layer this way

resNet.Add(new ann::Padding<>(1, 1, 1, 1));

and i checked the output shape before the padding layer and after the padding layer they are both the same. the way i did this was to just comment out the padding layer from resnet so it was never added and see the output shape and second by just adding it so i think i might not be using it in the right way.

kartikdutt18 commented 3 years ago

Was includeTop set to True?

Aakash-kaushik commented 3 years ago

Was includeTop set to True?

Hey, as you can see the include part comes later in the code, I just updated the code for resnet_impl so you can see that, i am referencing this line: https://github.com/Aakash-kaushik/models/blob/resnet/models/resnet/resnet_impl.hpp#L111

zoq commented 3 years ago

All the padding layer does is to pad the input and put the original data in the center:

https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48

can you confirm that nRows = input.n_rows; nCols = input.n_cols; is what you see as input, from the arguments the expected output should be increased by one after the layer right?

Aakash-kaushik commented 3 years ago

All the padding layer does is to pad the input and put the original data in the center:

https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48

can you confirm that nRows = input.n_rows; nCols = input.n_cols; is what you see as input, from the arguments the expected output should be increased by one after the layer right?

Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way

zoq commented 3 years ago

All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that nRows = input.n_rows; nCols = input.n_cols; is what you see as input, from the arguments the expected output should be increased by one after the layer right?

Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way

I could hope on a call real quick.

Aakash-kaushik commented 3 years ago

All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that nRows = input.n_rows; nCols = input.n_cols; is what you see as input, from the arguments the expected output should be increased by one after the layer right?

Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way

I could hope on a call real quick.

Would 8:30 pm IST work for you, basically 27 mins from now ?

zoq commented 3 years ago

All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that nRows = input.n_rows; nCols = input.n_cols; is what you see as input, from the arguments the expected output should be increased by one after the layer right?

Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way

I could hope on a call real quick.

Would 8:30 pm IST work for you, basically 27 mins from now ?

That works for me.

Aakash-kaushik commented 3 years ago

All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that nRows = input.n_rows; nCols = input.n_cols; is what you see as input, from the arguments the expected output should be increased by one after the layer right?

Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way

I could hope on a call real quick.

Would 8:30 pm IST work for you, basically 27 mins from now ?

That works for me.

Btw can't invite you because I can't see your email id anymore, so i think we can meet on the mlpack zoom room.

zoq commented 3 years ago

All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that nRows = input.n_rows; nCols = input.n_cols; is what you see as input, from the arguments the expected output should be increased by one after the layer right?

Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way

I could hope on a call real quick.

Would 8:30 pm IST work for you, basically 27 mins from now ?

That works for me.

Btw can't invite you because I can't see your email id anymore, so i think we can meet on the mlpack zoom room.

The zoom rooms works.

Aakash-kaushik commented 3 years ago

So i found out about the dim mismatch error and i think that is happening because the AddMerge layer instead of taking the outptut of the previous layer and sending it to both the downSample block and the first Sequential block, it is taking the output of it's first layer and sending that to both of them that i what i could get based on what i can see but i don't think that is possible so if you guys are free sometime today we can talk over a meet or we can conitnue here too. because tomorrow is weekend and no taunt to @zoq but i get it, and i am also trying to enjoy over the weekends :rocket:

Aakash-kaushik commented 3 years ago

Also i have pushed the code to the latest stage so you can see that too.

kartikdutt18 commented 3 years ago

Could you share the dim mismatch in a bit more detail here.

Aakash-kaushik commented 3 years ago

So this is the output that we get when we have a single basicBlock

Convolution: 3 64 7 7 2 2 3 3 112 112
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
Rows: 200704
Cols: 1

The Rows output just comes from multiplying: 56 * 56 * 64 = 200704

Then below we can see the output that comes out when i add a second basicBlock

Convolution: 3 64 7 7 2 2 3 3 112 112
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
new layer
Convolution: 64 128 3 3 2 2 1 1 28 28
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 28 28
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 2 2 0 0 28 28
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 28 28
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 28 28
BatchNorm: 128
IdentityLayer
Relu

error: addition: incompatible matrix dimensions: 100352x1 and 25088x1
terminate called after throwing an instance of 'std::logic_error'
  what():  addition: incompatible matrix dimensions: 100352x1 and 25088x1
Aborted (core dumped)

Now we see the mismatch error a fig of 100252 comes from multiplying 28 * 28 *128 and if we pass this though the dowsample block than we get 25088 but we don't want to pass the output of the first layer of the second basicBlock. what we want to do is pass the 200704 here so it then becomes 100252 after going into the downSample block and adds successfully.