Closed Subuday closed 5 months ago
@Subuday oh shoot, yes i think you are right, thanks!
wait, did they introduce the scale
kwarg recently? i must have done it this way because they didn't have it previously. we'll need to enforce a certain pytorch version if so
@Subuday there is a bug where in the absence of qk norm, custom scales are not applied however, and let me quickly fix that
@Subuday let's go with this for now. they didn't have this scale
in previous versions
Thanks!
In current implementation custom scaled is not passed to flash attention.