issues
search
kyegomez
/
AttentionIsOFFByOne
Implementation of "Attention Is Off By One" by Evan Miller
MIT License
179
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
docs(README): fix equation formatting
#7
YodaEmbedding
closed
1 year ago
1
Is there any evidence that softmax one can takes advantages over normal softmax?
#6
ZGCTroy
opened
1 year ago
0
IMPORTANT:The definition of the softmax one is wrong
#5
PhilIp-L-Good
opened
1 year ago
4
Is there a test showing effectiveness of softmax1 removing outliers?
#4
immars
opened
1 year ago
0
If you wanna make it fast, just use nn.softmax() and concatenate a zero.
#3
mcourteaux
opened
1 year ago
1
how to solve the issue of overflow
#2
ZGCTroy
opened
1 year ago
0
Discuss: `softmax_one` and `zero_vector` in QuietAttention conflicts?
#1
Kahsolt
opened
1 year ago
1