lucidrains / BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
MIT License
384 stars 13 forks source link

Flash attention error in Linear Attention layer #29

Closed ZFTurbo closed 4 months ago

ZFTurbo commented 5 months ago

I noticed error in Linear Attention layer when flash_attn is set in True:

File "attend.py", line 84, in flash_attn
    out = F.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution. 

In standard Attention (time_transformer and freq_transformer) it works ok.

So currently as workaround I did followig change in LinearAttention (set False to flash):

self.attend = Attend(
    scale=scale,
    dropout=dropout,
    flash=False
)
lucidrains commented 4 months ago

@ZFTurbo ohh ok, i've made it configurable using linear_flash_attn kwarg