kwta activation + self attention

numenta / nupic.research

Experimental algorithms. Unsupported.

https://nupicresearch.readthedocs.io

GNU Affero General Public License v3.0

104 stars 60 forks source link

kwta activation + self attention #646

Open DanTaranis opened 1 year ago

DanTaranis commented 1 year ago

Hey - first of all - thank you for your inspiring research.

there's a lot of work around how to make efficient self attention - especially as the sequence length increases. it seems to me that in assumption of kwta - you could skip the vast majority of calculations due to the inherent extreme sparsity.
and the best part is it would be complementary to many of the linear complexity attention methods that are coming out.

Are you experimenting with something like that?

Regards, Dan

DanTaranis commented 1 year ago

fyi - I did a quick poc with cifar 10 + a small ViT trained with and without kwta (90% sparsity) - and the kwta actually worked a bit like a regularization (slightly higher max validation accuracy + slower convergence).

so looks like this definitely has potential. my team /I may look farther into this if you want to collaborate on a paper or something.