issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
Removing slow models
#27
Closed
thomwolf
closed
8 months ago
thomwolf
commented
8 months ago
removing slow models - we now always require flash-attention
also renaming
fused
folder to
nn
and moving
activations
from transformers library in it
removing gpt/falcon models
making sure all scripts can run also without
transformers
and
datasets
fused
folder tonn
and movingactivations
from transformers library in ittransformers
anddatasets