Open Light-Reflection opened 2 years ago
Hey,
position in the residual units: with bottleneck: BN ReLu Conv (1x1)
BN ReLu Conv (3x3) PfAAM
BN ReLu Conv (1x1)
add_input
without bottleneck: BN ReLu Conv (3x3) PfAAM
BN ReLu Conv (3x3)
add_input
Experiments in the paper were performed with pfaam at the middle position as shown here, however performance is also improved when positioned before identity mapping (add_input). Not recommended is at the very beginning or after identity.
Hey,
position in the residual units: with bottleneck: BN ReLu Conv (1x1)
BN ReLu Conv (3x3) PfAAM
BN ReLu Conv (1x1)
add_input
without bottleneck: BN ReLu Conv (3x3) PfAAM
BN ReLu Conv (3x3)
add_input
Experiments in the paper were performed with pfaam at the middle position as shown here, however performance is also improved when positioned before identity mapping (add_input). Not recommended is at the very beginning or after identity.
hi nkoerb,
I extremely regrettable tell you PfAAM doesn't work in ImagenNet , the result is no different. I just insert PfAAM into bottleneck ----> conv1x1 bn relu + conv3x3 bn relu PfAAM + conv1x1 bn add_input relu, there is something diff from your pre- activation structure.
but, It works in another downstream classification task. acc+0.2x%, r +0.40%, about 25w samples.
Hey, @Light-Reflection
Thanks for testing. Unfortunate! I never ran it on imagenet. What overall model did you use? Did you implement in tf or pytorch?
Glad that it worked on the other application.
@nkoerb hi,
I tested it by using resnet50 and i implemented it in tf1.15, so i had to change your tf2 codes to tf1.x. and i do some test to make sure my PfAAM-tf1.x codes is equal to yours.
i think pfaam is like a attention module, maybe add trainable params or weights would worked better.
i love such an elegant / plug and play module, and i looking forward to your new work
Thanks @Light-Reflection , when you have the time/resources, you could try one additional thing:
def PfAAM(x):
keep = x
x = BatchNormalization()(x) # <----------------- add BN here
channel_act = GlobalAveragePooling2D()(x)
spatial_act = Lambda(AveragePoolChannels)(x)
y = Lambda(MatMul)([spatial_act,channel_act])
y = Activation("sigmoid")(y)
res = multiply([keep,y])
return res
because of the different activation order, the BN might have an effect
@nkoerb hey guy, when use pre-bn just like you said, my model in imagenet gained acc increased about + 0.3 %, but i ran the experiment again, it doesn't works. but the fact is i had run my codes in imagenet tens of time, it never achieved such acc.
@nkoerb hey guy, when use pre-bn just like you said, my model in imagenet gained acc increased about + 0.3 %, but i ran the experiment again, it doesn't works. but the fact is i had run my codes in imagenet tens of time, it never achieved such acc.
Sounds great! Due to the stochastic nature of the training/initialization, the outcome varies from training to training. Thanks for testing
you make a great job, if u have other ideas, i can do another test for u . :)
No doubt pfaam is a great job, but the paper no explain clearly where is right position for pfaam?could u tell me ?