lucidrains / local-attention

An implementation of local windowed attention for language modeling
MIT License
383 stars 40 forks source link

About the performance #17

Closed ThyrixYang closed 1 year ago

ThyrixYang commented 1 year ago

Are there any benchmarks for this library's performance (memory, speed, accuracy)? Is this library a direct implementation of the Longformer method? The Longforer has a chunk version and a cuda version, which one does this lib implement?

This code has been battletested in multiple repositories already

Would you please provide more details about the existing usage of this library?

lucidrains commented 1 year ago

@ThyrixYang this one is a chunked version with lookback, without the need for CUDA

longformer is from wayback, and i believe is local attention mixed with dedicated global attention lanes

ThyrixYang commented 1 year ago

@lucidrains Thanks for your explanation.