issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Ci move
#234
glegendre01
closed
2 days ago
0
Learning rate restart broken with Nanoset?
#233
Pclanglais
opened
3 days ago
5
precommit
#232
xrsrke
closed
2 weeks ago
0
Pr/eliebak/220
#231
xrsrke
closed
2 weeks ago
0
Fix resuming PP > 1
#230
TJ-Solergibert
opened
3 weeks ago
0
update bench cluster
#229
3outeille
closed
3 weeks ago
0
change naonset args definition to make it compatible with the parser
#228
eliebak
closed
3 weeks ago
1
delete me
#227
ischlag
closed
3 weeks ago
0
[feature] Use Unified Sequence Parallel (USP) instead of Ring attention
#226
feifeibear
opened
3 weeks ago
0
Use different positional embeddings
#225
aflah02
opened
1 month ago
2
log "No checkpoint path provided" only on rank 0
#224
eliebak
closed
1 month ago
0
remove torch compile
#223
eliebak
closed
1 month ago
0
lighteval support after checkpoint, UX refactor
#222
eliebak
opened
1 month ago
4
error resuming from checkpoint if PP > 1
#221
moussaKam
opened
1 month ago
2
add support for s3 checkpoint
#220
eliebak
closed
2 weeks ago
1
Refactor pre tokenization tool
#219
eliebak
opened
1 month ago
0
`RuntimeError: First class dim doesn't work with python 3.12` on minimal exmaple using 3.12 conda env
#217
wilzh40
closed
1 month ago
2
Hello, team
#216
barneylogo
closed
1 month ago
1
Hello, team
#215
barneylogo
closed
1 month ago
0
Match Transformers RoPE implementation
#214
zzhhjjj
closed
3 weeks ago
3
Circular import
#213
Na00s
closed
1 month ago
8
Pr/tj solergibert/189
#212
xrsrke
closed
1 month ago
0
Inquiry about MLflow Logging Integration
#211
justHungryMan
closed
1 month ago
1
Will audio input training be supported in the future?
#210
zira-wang
opened
2 months ago
0
multi-node pp hang when enable gradient accumulation
#209
yuuxiaooqingg
opened
2 months ago
4
Memory optimization in async tp-linear
#208
AleHD
closed
1 month ago
1
Add layer-wise activation recomputation to llama model
#207
C-TC
closed
2 months ago
1
Update README.md
#206
xrsrke
closed
2 months ago
0
Update README.md
#205
xrsrke
closed
2 months ago
0
NCCL collective operation timeout
#204
heya5
closed
2 months ago
1
Fix tp mem cache
#203
AleHD
closed
1 month ago
9
refacto generate + use simpler rotary for inference
#202
3outeille
opened
3 months ago
0
Request for detailed FineWeb-ablation-models training strategy & hyperparams
#201
JefferyChen453
closed
1 month ago
3
Created interconnect benchmark before the training
#200
RamenBuddha
opened
3 months ago
0
Refacto generate
#199
3outeille
closed
3 months ago
1
[Bug] Missing `_is_using_mup` when resume checkpoint
#198
xrsrke
opened
3 months ago
0
Training on AMD/TPU GPUs
#197
jinsong-mao
closed
1 month ago
1
how to run benchmark tests
#196
jinsong-mao
closed
3 months ago
0
Fineweb Configuration
#195
nezhazheng
closed
1 month ago
1
Fix: Update wrong typing on the function get_local_ranks
#194
morgangiraud
opened
3 months ago
1
feat(ci): add trufflehog secrets detection
#193
McPatate
closed
3 months ago
1
Move MoE Implementation into src/, add Load Balancing Losses
#192
haeggee
opened
3 months ago
0
Circular import
#191
xcvil
closed
1 month ago
2
Add utility to preview samples used for training. See https://github.com/huggingface/nanotron/issues/184.
#190
kylematoba
opened
3 months ago
0
Supporting datatrove tokenized documents with Nanosets
#189
TJ-Solergibert
closed
1 month ago
2
add rope_theta to hf conversion script
#188
jquesnelle
closed
4 months ago
0
Adding support for training chat models
#187
TJ-Solergibert
closed
1 month ago
1
Migrate internal brrr to nanotron
#186
NouamaneTazi
opened
4 months ago
0
Added 1-sqrt function for cooldown phase
#185
eliebak
closed
4 months ago
0
Add Debug utility to be able to preview first samples used for training
#184
NouamaneTazi
opened
4 months ago
0
Next