Adaptive scale behaviour

slashedstar commented 5 months ago

The adaptive scale seems to behave in a non-intuitive way, I added a print(signal_scale) after line 79 in pag_nodes.py to inspect the values and with 11 steps and 4 PAG scale this is what I get Adaptive scale .1: 3.9, 3.9, 3.9, 3.8, 3.8, 3.8, 3.7, 3.7, 3.7, 3.6, 3.6 Adaptive scale .3: 3.9, 1.0, -1., -4., -7., -10, -13, -16, -19, -22, -25 Adaptive scale .5: 3.7, -19, -41, -64, -87, -10, -13, -15, -17, -20, -22

Both .3 and .5 disable PAG after the first step, where one (or at least I) would expect it to decay and zero out only after ~30% and ~50% of the steps respectively, or maybe decay by 10%, 30% and 50% every step, idk, but the way it currently works seems very unintuitive, am I missing something?

pamparamm commented 5 months ago

This was the attempt to make the implementation from the original repo a little bit more intuitive. Since some users are used to this parameter, I'm reluctant to change it's behavior. As an alternative, you can chain multiple PAG nodes and set different scale and sigma_start/sigma_end for them to manually schedule pag scaling

slashedstar commented 5 months ago

I see (though I have no idea how someone would get used to this behavior🤭), I'll just stick to my local edits then, thanks!

pamparamm / sd-perturbed-attention

Adaptive scale behaviour #15