Closed Aakash-kaushik closed 3 years ago
The idea of the sequential
layer is that it wraps arbitrary layers and exposes them as if it would be a single layer. The sequential
layer has a template parameter (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L70), which tells the layer to add the input to the output of the last layer. There is also a convenient typedef (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L260-L261) that already sets the template parameter for you. Below is an elementary example:
Residual<>* residual = new Residual<>(true);
Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);
residual->Add(linearA);
residual->Add(linearB);
in this case linearA
and linearB
are run and the input is also added to the output of the last layer, which in this case is linearB
.
There is also a test case - https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/tests/ann_layer_test.cpp#L3325
The idea of the
sequential
layer is that it wraps arbitrary layers and exposes them as if it would be a single layer. Thesequential
layer has a template parameter (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L70), which tells the layer to add the input to the output of the last layer. There is also a convenient typedef (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L260-L261) that already sets the template parameter for you. Below is an elementary example:Residual<>* residual = new Residual<>(true); Linear<>* linearA = new Linear<>(10, 10); Linear<>* linearB = new Linear<>(10, 10); residual->Add(linearA); residual->Add(linearB);
in this case
linearA
andlinearB
are run and the input is also added to the output of the last layer, which in this case islinearB
.There is also a test case - https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/tests/ann_layer_test.cpp#L3325
Hi thanks @zoq for this but the part that confused me was that it checks in the code that if the dimensions of the first layer are equal to the last layer or not and for ResNet there would be a case where the input dim of the first layer will be diff from the last one and so I need a conv 1*1 block just for the first layer input that is not run like all layers but separately before adding it to the last layer.
how do you suggest i accomplish that ?
cc: @kartikdutt18
In this case you can use a combination of AddMerge
and Sequential
. The AddMerge
layer just takes arbitrary runs each layer and at the end adds them together.
AddMerge<> resblock(false, false);
Sequential<>* sequential = new Sequential<>(true);
Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);
sequential->Add(linearA);
sequential->Add(linearB);
Convolution<> conv = new Convolution<>(...);
resblock.Add(sequential);
resblock.Add(conv);
Let me know if this is what you looked for. Maybe it makes sense to implement that structure as an independent layer in mlpack.
In this case you can use a combination of
AddMerge
andSequential
. TheAddMerge
layer just takes arbitrary runs each layer and at the end adds them together.AddMerge<> resblock(false, false); Sequential<>* sequential = new Sequential<>(true); Linear<>* linearA = new Linear<>(10, 10); Linear<>* linearB = new Linear<>(10, 10); sequential->Add(linearA); sequential->Add(linearB); Convolution<> conv = new Convolution<>(...); resblock.Add(sequential); resblock.Add(conv);
Let me know if this is what you looked for. Maybe it makes sense to implement that structure as an independent layer in mlpack.
I am still stuck at this and don't exactly know what to do, it would be easy if we somehow had a way to define the flow for the network but that is not how it is designed and the main problem here is the downsampling block, I can put all of the things inside a residual block and say it saves the input of the first layer into a temp variable and then tries to add it to the last layer in the residual block but when it does that it finds out that the shapes don't match and I don't see how I can use AddMerge to achieve the same flow. but do let me know if you see some other way around it. I have been thinking around it for way too long.
I will try to think of a solution for this and get back to you.
Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?
Something like:
Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?
Something like:
Yes i believe this is exactly what i am trying to do. this was a great diagram for that. Thank you so much.
Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?
Something like:
Also @zoq if you have time, can you explain how the diagram you made would work in code and also explain a little bit about the code that you wrote with addmerge and residual layer together.
Sure, if you have a specific example you like me to show here, let me know if not I just keep it general.
Sure, if you have a specific example you like me to show here, let me know if not I just keep it general.
Yup let me reference a code block from torchvision that i would like to have, that should give a better idea. https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L59-L83
Sure, if you have a specific example you like me to show here, let me know if not I just keep it general.
Yup let me reference a code block from torchvision that i would like to have, that should give a better idea. https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L59-L83
btw this was just to give you an idea on what i was thinking upon any example that is similar or close should work for me.
Sequential<>* sequential = new Sequential<>(true);
Convolution<> convA = new Convolution<>(...);
Convolution<> convA = new Convolution<>(...);
sequential->Add(convA);
sequential->Add(convA);
Sequential<>* downsample_layer = this->Downsample(...)
resblock.Add(sequential);
resblock.Add(downsample_layer);
Hey @zoq don't post the code example anymore, i had a meet with kartik and we figured out the residual block so i can implement it now. Will also let you know about other things we talked about im our meeting tomorrow, might be interesting.
Sounds great.
Thanks for adding the code, this makes it super easy to understand in what direction this is moving forward.
Thanks for adding the code, this makes it super easy to understand in what direction this is moving forward.
Yes I will keep the PR open as our discussion forum and when I am done I can create separate PR that reviews the code and functionality this is more to just have a daily update on what I worked on that day.
Just pushed bottleneck block and wanted to confirm two things:
Just pushed bottleneck block and wanted to confirm two things:
- Should the addMerge block should be left with it's default setting for our purposes?
With default parameter you mean model
and run
? If that is the case, you have to set model = true
and run = true
, you only have to set model=false
if the layers you passed are initialized as part of another layer, e.g. if you pass the same layer to another AddMerge
layer, in this case you don't want to initialize the layer twice.
- the identity layer only acts as a layer that that passes the input and gradients without any modifications, right ? I mean it is evident but i still wanted to confirm it once.
That is correct.
Just pushed bottleneck block and wanted to confirm two things:
- Should the addMerge block should be left with it's default setting for our purposes?
With default parameter you mean
model
andrun
? If that is the case, you have to setmodel = true
andrun = true
, you only have to setmodel=false
if the layers you passed are initialized as part of another layer, e.g. if you pass the same layer to anotherAddMerge
layer, in this case you don't want to initialize the layer twice.
- the identity layer only acts as a layer that that passes the input and gradients without any modifications, right ? I mean it is evident but i still wanted to confirm it once.
That is correct.
thanks
Work left:
With these 4 things the ResNet module should be done. Just writing these out because for some reason i tend to miss things in the meet.
For some reason the input width and height keep on increasing while i go down in the architecture. adding the output for this and also updating the code.
the last two figure in layers are the inputWidth and inputHeight respectively.
Convolution: 3 64 7 7 2 2 3 3 224 224
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 1 1 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 58 58
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 1 1 1 1 1 1 58 58
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 60 60
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 128 1 1 1 1 1 1 60 60
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 62 62
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 1 1 0 0 62 62
BatchNorm: 128
Relu
Convolution: 128 128 1 1 1 1 1 1 62 62
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 64 64
BatchNorm: 128
IdentityLayer
Relu
Convolution: 128 256 1 1 1 1 1 1 64 64
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 66 66
BatchNorm: 256
DownSample below
Convolution: 128 256 1 1 1 1 0 0 66 66
BatchNorm: 256
Relu
Convolution: 256 256 1 1 1 1 1 1 66 66
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 68 68
BatchNorm: 256
IdentityLayer
Relu
Convolution: 256 512 1 1 1 1 1 1 68 68
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 70 70
BatchNorm: 512
DownSample below
Convolution: 256 512 1 1 1 1 0 0 70 70
BatchNorm: 512
Relu
Convolution: 512 512 1 1 1 1 1 1 70 70
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 72 72
BatchNorm: 512
IdentityLayer
Relu
AdaptiveMeanPooling: 1,1
Linear: 512 1000
Segmentation fault (core dumped)
Have you compared it with PyTorch?
Have you compared it with PyTorch?
I did, btw Pytorch doesn't shows the input dimensions so did that with a tf implementation with same input size and as it is obvious the dimensions which are width and height should decrease while the channels increases as we go.
For some reason the input width and height keep on increasing while i go down in the architecture. adding the output for this and also updating the code.
the last two figure in layers are the inputWidth and inputHeight respectively.
Convolution: 3 64 7 7 2 2 3 3 224 224 BatchNorm: 64 Relu Padding: 1,1,1,1 114 114 MaxPool: 3,3,2,2 56 56 Convolution: 64 64 1 1 1 1 1 1 56 56 BatchNorm: 64 Relu Convolution: 64 64 3 3 1 1 1 1 58 58 BatchNorm: 64 IdentityLayer Relu Convolution: 64 64 1 1 1 1 1 1 58 58 BatchNorm: 64 Relu Convolution: 64 64 3 3 1 1 1 1 60 60 BatchNorm: 64 IdentityLayer Relu Convolution: 64 128 1 1 1 1 1 1 60 60 BatchNorm: 128 Relu Convolution: 128 128 3 3 1 1 1 1 62 62 BatchNorm: 128 DownSample below Convolution: 64 128 1 1 1 1 0 0 62 62 BatchNorm: 128 Relu Convolution: 128 128 1 1 1 1 1 1 62 62 BatchNorm: 128 Relu Convolution: 128 128 3 3 1 1 1 1 64 64 BatchNorm: 128 IdentityLayer Relu Convolution: 128 256 1 1 1 1 1 1 64 64 BatchNorm: 256 Relu Convolution: 256 256 3 3 1 1 1 1 66 66 BatchNorm: 256 DownSample below Convolution: 128 256 1 1 1 1 0 0 66 66 BatchNorm: 256 Relu Convolution: 256 256 1 1 1 1 1 1 66 66 BatchNorm: 256 Relu Convolution: 256 256 3 3 1 1 1 1 68 68 BatchNorm: 256 IdentityLayer Relu Convolution: 256 512 1 1 1 1 1 1 68 68 BatchNorm: 512 Relu Convolution: 512 512 3 3 1 1 1 1 70 70 BatchNorm: 512 DownSample below Convolution: 256 512 1 1 1 1 0 0 70 70 BatchNorm: 512 Relu Convolution: 512 512 1 1 1 1 1 1 70 70 BatchNorm: 512 Relu Convolution: 512 512 3 3 1 1 1 1 72 72 BatchNorm: 512 IdentityLayer Relu AdaptiveMeanPooling: 1,1 Linear: 512 1000 Segmentation fault (core dumped)
Hey @kartikdutt18, @zoq the part where the inputWidth
and outputWidth
is updated inside the conv3x3 and conv1x1 functions that isin't actually updating the variables, do you guys know why would that be ?
You mean https://github.com/mlpack/models/pull/61/files#diff-a516cb18be832520af513b17fdd1b96f5e140f661a1f5c34a69f3e4e5a3c19b3R90-R92 is not updating?
Yes, I can paste a output and you will see that before and after the operation the dimensions are still the same. I am on windows right now, but will paste an output in sometime.
You mean https://github.com/mlpack/models/pull/61/files#diff-a516cb18be832520af513b17fdd1b96f5e140f661a1f5c34a69f3e4e5a3c19b3R90-R92 is not updating?
Yes, I can paste a output and you will see that before and after the operation the dimensions are still the same. I am on windows right now, but will paste an output in sometime.
Convolution: 3 64 7 7 2 2 3 3 224 224
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 1 1 0 0 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 56 56
56 56
BatchNorm: 128
IdentityLayer
Relu
Convolution: 128 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
DownSample below
Convolution: 128 256 1 1 1 1 0 0 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
Relu
Convolution: 256 256 3 3 1 1 1 1 56 56
56 56
BatchNorm: 256
IdentityLayer
Relu
Convolution: 256 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
DownSample below
Convolution: 256 512 1 1 1 1 0 0 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
Relu
Convolution: 512 512 3 3 1 1 1 1 56 56
56 56
BatchNorm: 512
IdentityLayer
Relu
AdaptiveMeanPooling: 1,1
Linear: 512 1000
that 56 56
you see after layer is just printed after the updation of inputWidth and inputheight but as you can see it's still the same.
You mean https://github.com/mlpack/models/pull/61/files#diff-a516cb18be832520af513b17fdd1b96f5e140f661a1f5c34a69f3e4e5a3c19b3R90-R92 is not updating?
Yes, I can paste a output and you will see that before and after the operation the dimensions are still the same. I am on windows right now, but will paste an output in sometime.
Convolution: 3 64 7 7 2 2 3 3 224 224 BatchNorm: 64 Relu Padding: 1,1,1,1 114 114 MaxPool: 3,3,2,2 56 56 Convolution: 64 64 3 3 1 1 1 1 56 56 56 56 BatchNorm: 64 Relu Convolution: 64 64 3 3 1 1 1 1 56 56 56 56 BatchNorm: 64 IdentityLayer Relu Convolution: 64 64 3 3 1 1 1 1 56 56 56 56 BatchNorm: 64 Relu Convolution: 64 64 3 3 1 1 1 1 56 56 56 56 BatchNorm: 64 IdentityLayer Relu Convolution: 64 128 3 3 1 1 1 1 56 56 56 56 BatchNorm: 128 Relu Convolution: 128 128 3 3 1 1 1 1 56 56 56 56 BatchNorm: 128 DownSample below Convolution: 64 128 1 1 1 1 0 0 56 56 56 56 BatchNorm: 128 Relu Convolution: 128 128 3 3 1 1 1 1 56 56 56 56 BatchNorm: 128 Relu Convolution: 128 128 3 3 1 1 1 1 56 56 56 56 BatchNorm: 128 IdentityLayer Relu Convolution: 128 256 3 3 1 1 1 1 56 56 56 56 BatchNorm: 256 Relu Convolution: 256 256 3 3 1 1 1 1 56 56 56 56 BatchNorm: 256 DownSample below Convolution: 128 256 1 1 1 1 0 0 56 56 56 56 BatchNorm: 256 Relu Convolution: 256 256 3 3 1 1 1 1 56 56 56 56 BatchNorm: 256 Relu Convolution: 256 256 3 3 1 1 1 1 56 56 56 56 BatchNorm: 256 IdentityLayer Relu Convolution: 256 512 3 3 1 1 1 1 56 56 56 56 BatchNorm: 512 Relu Convolution: 512 512 3 3 1 1 1 1 56 56 56 56 BatchNorm: 512 DownSample below Convolution: 256 512 1 1 1 1 0 0 56 56 56 56 BatchNorm: 512 Relu Convolution: 512 512 3 3 1 1 1 1 56 56 56 56 BatchNorm: 512 Relu Convolution: 512 512 3 3 1 1 1 1 56 56 56 56 BatchNorm: 512 IdentityLayer Relu AdaptiveMeanPooling: 1,1 Linear: 512 1000
that
56 56
you see after layer is just printed after the updation of inputWidth and inputheight but as you can see it's still the same.
Seems like i am wrong and it is updating the params but they are just coming up to be 56 again. Will look into it.
If you debug something like that, I would recommend to start with something smaller, like in this case it's the first Conv layer, so you don't need the other layers.
If you debug something like that, I would recommend to start with something smaller, like in this case it's the first Conv layer, so you don't need the other layers.
Hi, thanks for this the reason why i wasn't doing this was because i thought the predict function or something at the end expects a certain way the matrix should be but i see we don't use a two dimensional array similar to other libraries to represent width and dimensions. Because they are 4 dimensional (channel, width, height, batch) but i read further mlpack code and saw that we have (channelwidthheight, batch)
If you debug something like that, I would recommend to start with something smaller, like in this case it's the first Conv layer, so you don't need the other layers.
So I did this and as soon as I wrap even a single layer into the sequential layer and add that to the network I get a segmentation error and I have taken care of the point where it should be preceded with an identity layer if it is used as the first layer which it isn't and the second point where it needs at least two layers to call its gradient function I have added two-layer which work when independently added to the network but not when added in the sequential block and also I don't think the predict function calls the gradient function.
The last commit id for my mlpack version: 58869b57c50926c4e9682c35c6c6efa8ea31d604 https://github.com/mlpack/mlpack/commit/58869b57c50926c4e9682c35c6c6efa8ea31d604
I believe it's the latest one that was merged 3 days ago.
The last commit id for my mlpack version: 58869b57c50926c4e9682c35c6c6efa8ea31d604 mlpack/mlpack@58869b5
I believe it's the latest one that was merged 3 days ago.
Thanks with that I'm able to reproduce the issue on my system.
The last commit id for my mlpack version: 58869b57c50926c4e9682c35c6c6efa8ea31d604 mlpack/mlpack@58869b5 I believe it's the latest one that was merged 3 days ago.
Thanks with that I'm able to reproduce the issue on my system.
That's great, I did try a couple more things yesterday but to no avail.
Some of them were:
Things i will try:
Do let me know if you find something too. ✌️
Hey @zoq, @kartikdutt18 i figured it out. I feel happy and dumb at the same time. So in the get model function it was just returning the ffn object rather than the reference of it.
Had this
ann::FFN<OutputLayerType, InitializationRuleType> GetModel()
{ return resNet; }
When it should have been this:
ann::FFN<OutputLayerType, InitializationRuleType>& GetModel()
{ return resNet; }
So i am using this padding layer this way
resNet.Add(new ann::Padding<>(1, 1, 1, 1));
and i checked the output shape before the padding layer and after the padding layer they are both the same. the way i did this was to just comment out the padding layer from resnet so it was never added and see the output shape and second by just adding it so i think i might not be using it in the right way.
Was includeTop
set to True
?
Was
includeTop
set toTrue
?
Hey, as you can see the include part comes later in the code, I just updated the code for resnet_impl so you can see that, i am referencing this line: https://github.com/Aakash-kaushik/models/blob/resnet/models/resnet/resnet_impl.hpp#L111
All the padding layer does is to pad the input and put the original data in the center:
can you confirm that nRows = input.n_rows; nCols = input.n_cols;
is what you see as input, from the arguments the expected output should be increased by one after the layer right?
All the padding layer does is to pad the input and put the original data in the center:
can you confirm that
nRows = input.n_rows; nCols = input.n_cols;
is what you see as input, from the arguments the expected output should be increased by one after the layer right?
Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way
All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that
nRows = input.n_rows; nCols = input.n_cols;
is what you see as input, from the arguments the expected output should be increased by one after the layer right?Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way
I could hope on a call real quick.
All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that
nRows = input.n_rows; nCols = input.n_cols;
is what you see as input, from the arguments the expected output should be increased by one after the layer right?Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way
I could hope on a call real quick.
Would 8:30 pm IST work for you, basically 27 mins from now ?
All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that
nRows = input.n_rows; nCols = input.n_cols;
is what you see as input, from the arguments the expected output should be increased by one after the layer right?Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way
I could hope on a call real quick.
Would 8:30 pm IST work for you, basically 27 mins from now ?
That works for me.
All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that
nRows = input.n_rows; nCols = input.n_cols;
is what you see as input, from the arguments the expected output should be increased by one after the layer right?Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way
I could hope on a call real quick.
Would 8:30 pm IST work for you, basically 27 mins from now ?
That works for me.
Btw can't invite you because I can't see your email id anymore, so i think we can meet on the mlpack zoom room.
All the padding layer does is to pad the input and put the original data in the center: https://github.com/mlpack/mlpack/blob/58869b57c50926c4e9682c35c6c6efa8ea31d604/src/mlpack/methods/ann/layer/padding_impl.hpp#L43-L48 can you confirm that
nRows = input.n_rows; nCols = input.n_cols;
is what you see as input, from the arguments the expected output should be increased by one after the layer right?Hey would it be possible for you to come for a meet today ? It shouldn't take more than 10 mins and is a bit to explain this way
I could hope on a call real quick.
Would 8:30 pm IST work for you, basically 27 mins from now ?
That works for me.
Btw can't invite you because I can't see your email id anymore, so i think we can meet on the mlpack zoom room.
The zoom rooms works.
So i found out about the dim mismatch error and i think that is happening because the AddMerge
layer instead of taking the outptut of the previous layer and sending it to both the downSample
block and the first Sequential
block, it is taking the output of it's first layer and sending that to both of them that i what i could get based on what i can see but i don't think that is possible so if you guys are free sometime today we can talk over a meet or we can conitnue here too. because tomorrow is weekend and no taunt to @zoq but i get it, and i am also trying to enjoy over the weekends :rocket:
Also i have pushed the code to the latest stage so you can see that too.
Could you share the dim mismatch in a bit more detail here.
So this is the output that we get when we have a single basicBlock
Convolution: 3 64 7 7 2 2 3 3 112 112
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
Rows: 200704
Cols: 1
The Rows output just comes from multiplying: 56 * 56 * 64 = 200704
Then below we can see the output that comes out when i add a second basicBlock
Convolution: 3 64 7 7 2 2 3 3 112 112
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
new layer
Convolution: 64 128 3 3 2 2 1 1 28 28
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 28 28
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 2 2 0 0 28 28
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 28 28
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 28 28
BatchNorm: 128
IdentityLayer
Relu
error: addition: incompatible matrix dimensions: 100352x1 and 25088x1
terminate called after throwing an instance of 'std::logic_error'
what(): addition: incompatible matrix dimensions: 100352x1 and 25088x1
Aborted (core dumped)
Now we see the mismatch error a fig of 100252
comes from multiplying 28 * 28 *128
and if we pass this though the dowsample block than we get 25088
but we don't want to pass the output of the first layer of the second basicBlock
. what we want to do is pass the 200704
here so it then becomes 100252
after going into the downSample
block and adds successfully.
This PR aims to implement resnet module which would be able to create all the resnet variants from the paper and this aims to follow the same architecture as PyTorch for some reasons.
Things i have some doubts about:
Resources: