issues
search
pytorch-labs
/
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.34k
stars
484
forks
source link
Naming: n_local_heads -> n_kv_heads
#162
Open
ad8e
opened
2 months ago
ad8e
commented
2 months ago
n_local_heads
refers to TP sharding, rather than GQA.
n_local_heads
refers to TP sharding, rather than GQA.