it looks different from the standard self-attention mechanism

mynotwo / A-Fast-Transformer-based-General-Purpose-LosslessCompressor

This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Purpose LosslessCompressor".

24 stars 4 forks source link

it looks different from the standard self-attention mechanism #2

Open yhft-lgtms opened 1 year ago

yhft-lgtms commented 1 year ago

are you apply self-attention in 'numerator_and_denominator.py'? that seems puzzling! could you eplain that?

mynotwo commented 1 year ago

The attention scheme I applied is from the paper "Sub-Linear Memory: How to Make Performers SLiM", which is claimed as an efficient attention.