sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
MIT License
1.21k stars 64 forks source link

Use Cache with GLA model raised error #26

Closed OREYR closed 3 months ago

OREYR commented 3 months ago

Hi, I see that the RecurrentCache was renamed to Cache for gla model. However, it raised error as Cache does not have method “from_legacy_cache”.

OREYR commented 3 months ago

Hi @sustcsonglin , is there a reason why we need to change RecurrentCache to Cache here? Does RecurrentCache apply to RNN models only?

yzhangcs commented 3 months ago

@OREYR Thank you for reporting this bug.

is there a reason why we need to change RecurrentCache to Cache here? Does RecurrentCache apply to RNN models only?

The ongoing plan is to support more cache types, e.g., shortconv, sliding window, rnn, token-shift. So It would be better to rename it to a more general name.

OREYR commented 3 months ago

Thank you @yzhangcs . I wonder if it’s ok to continue to use RecurrentCache to train or evaluate GLA and RetNet in the repository at the moment.

yzhangcs commented 3 months ago

@OREYR Have you updated all the code. Which line did you met this bug. I found it's ok

>>> from fla.models.utils import Cache
>>> Cache.from_legacy_cache
<bound method Cache.from_legacy_cache of <class 'fla.models.utils.Cache'>>
OREYR commented 3 months ago

I tried the above code and got type object 'Cache' has no attribute 'from_legacy_cache'. There is no Cache class defined in the specified file. I only see RecurrentCache class.

yzhangcs commented 3 months ago

sry, fogot to push the commits 🤣