Closed stoprightthere closed 1 year ago
@stoprightthere shoot :man_facepalming:
could you see if https://github.com/lucidrains/x-transformers/commit/2a5bbdd83b74b61c944093b5d238b70aba196303 looks better?
@stoprightthere yea, i am keeping with the convention of True
for attend, False
for mask out. that should be the case for all my repos
@stoprightthere oh! i forgot i had to change the convention to integrate flash attention for this repository
one more review? :pray: https://github.com/lucidrains/x-transformers/commit/d731d82b99fbb3491b6a5f0d2012ea2f523411d8
https://github.com/lucidrains/x-transformers/commit/d2e0700aa0f3648c89885e319636244140d2d602 actually, decided to just redo everything to be consistent
looks good!
thanks for the work on this repo and for quick replies!
@stoprightthere no problem! thanks for identifying this issue and saving a lot of phd students a lot of headache lol
Hi,
Looks like flash_attn requires
mask
to be True whenever a position should NOT be attended to. However,F.scaled_dot_product_attention
expects the inverse (True means that the position takes part in the attention), if the mask is boolean.Unless there is
attn_bias
, there is no conversion, so essentially the mask is incorrect.tl;dr: flip
mask
whenever it exists andattn_bias
does not (and after applying the causal mask).