Does this unified view take attention mask into consideration?

jxhe / unify-parameter-efficient-tuning

Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)

Apache License 2.0

498 stars 44 forks source link

Open zwbx opened 1 year ago

zwbx commented 1 year ago

I am not familiar with the theoretic derivation, but I am interested in the range of suitability of the formula。Thank you。

jxhe commented 1 year ago

Yes the derivation holds for masked attention