1) I am just curious, why do you add batch normalization inside SeModule? Is there a reference to do so?
2) Please correct me if I make a mistake: I think SeModule should be added between dw and pw-linear, but your code seems to add that after pw-linear and right before residual connection
3) Do you think it's necessary to consider expand_ratio = 1? When expand_channel == output_channel, I feel that pw might be redundant, since the shape won't change a bit after pw.
1) I am just curious, why do you add batch normalization inside SeModule? Is there a reference to do so? 2) Please correct me if I make a mistake: I think SeModule should be added between dw and pw-linear, but your code seems to add that after pw-linear and right before residual connection 3) Do you think it's necessary to consider expand_ratio = 1? When expand_channel == output_channel, I feel that pw might be redundant, since the shape won't change a bit after pw.
Thank you!