nkoerb / pfaam

5 stars 0 forks source link

where is the best position for pfaam to insert? #1

Open Light-Reflection opened 2 years ago

Light-Reflection commented 2 years ago

No doubt pfaam is a great job, but the paper no explain clearly where is right position for pfaam?could u tell me ?

nkoerb commented 2 years ago

Hey,

position in the residual units: with bottleneck: BN ReLu Conv (1x1)

BN ReLu Conv (3x3) PfAAM

BN ReLu Conv (1x1)

add_input

without bottleneck: BN ReLu Conv (3x3) PfAAM

BN ReLu Conv (3x3)

add_input

Experiments in the paper were performed with pfaam at the middle position as shown here, however performance is also improved when positioned before identity mapping (add_input). Not recommended is at the very beginning or after identity.

Light-Reflection commented 2 years ago

Hey,

position in the residual units: with bottleneck: BN ReLu Conv (1x1)

BN ReLu Conv (3x3) PfAAM

BN ReLu Conv (1x1)

add_input

without bottleneck: BN ReLu Conv (3x3) PfAAM

BN ReLu Conv (3x3)

add_input

Experiments in the paper were performed with pfaam at the middle position as shown here, however performance is also improved when positioned before identity mapping (add_input). Not recommended is at the very beginning or after identity.

hi nkoerb,

I extremely regrettable tell you PfAAM doesn't work in ImagenNet , the result is no different. I just insert PfAAM into bottleneck ----> conv1x1 bn relu + conv3x3 bn relu PfAAM + conv1x1 bn add_input relu, there is something diff from your pre- activation structure.

but, It works in another downstream classification task. acc+0.2x%, r +0.40%, about 25w samples.

nkoerb commented 2 years ago

Hey, @Light-Reflection

Thanks for testing. Unfortunate! I never ran it on imagenet. What overall model did you use? Did you implement in tf or pytorch?

Glad that it worked on the other application.

Light-Reflection commented 2 years ago

@nkoerb hi,

I tested it by using resnet50 and i implemented it in tf1.15, so i had to change your tf2 codes to tf1.x. and i do some test to make sure my PfAAM-tf1.x codes is equal to yours.

i think pfaam is like a attention module, maybe add trainable params or weights would worked better.

i love such an elegant / plug and play module, and i looking forward to your new work

nkoerb commented 2 years ago

Thanks @Light-Reflection , when you have the time/resources, you could try one additional thing:

def PfAAM(x):
    keep = x
    x = BatchNormalization()(x)  # <----------------- add BN here
    channel_act = GlobalAveragePooling2D()(x)
    spatial_act = Lambda(AveragePoolChannels)(x)
    y = Lambda(MatMul)([spatial_act,channel_act])
    y = Activation("sigmoid")(y)
    res = multiply([keep,y])
    return res

because of the different activation order, the BN might have an effect

Light-Reflection commented 1 year ago

@nkoerb hey guy, when use pre-bn just like you said, my model in imagenet gained acc increased about + 0.3 %, but i ran the experiment again, it doesn't works. but the fact is i had run my codes in imagenet tens of time, it never achieved such acc.

Light-Reflection commented 1 year ago

@nkoerb hey guy, when use pre-bn just like you said, my model in imagenet gained acc increased about + 0.3 %, but i ran the experiment again, it doesn't works. but the fact is i had run my codes in imagenet tens of time, it never achieved such acc.

nkoerb commented 1 year ago

Sounds great! Due to the stochastic nature of the training/initialization, the outcome varies from training to training. Thanks for testing

Light-Reflection commented 1 year ago

you make a great job, if u have other ideas, i can do another test for u . :)