jxhe / unify-parameter-efficient-tuning

Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)
Apache License 2.0
498 stars 44 forks source link

Does this unified view take attention mask into consideration? #22

Open zwbx opened 1 year ago

zwbx commented 1 year ago

I am not familiar with the theoretic derivation, but I am interested in the range of suitability of the formula。Thank you。

jxhe commented 1 year ago

Yes the derivation holds for masked attention