issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.23k
stars
122
forks
source link
readme
#145
Closed
zzhhjjj
closed
3 months ago
zzhhjjj
commented
6 months ago
a bug-fixed solution for flash attention
a bug-fixed solution for flash attention