huggingface nanotron issues

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.14k stars 107 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Ci move

#234 glegendre01 closed 2 days ago
0
Learning rate restart broken with Nanoset?

#233 Pclanglais opened 3 days ago
5
precommit

#232 xrsrke closed 2 weeks ago
0
Pr/eliebak/220

#231 xrsrke closed 2 weeks ago
0
Fix resuming PP > 1

#230 TJ-Solergibert opened 3 weeks ago
0
update bench cluster

#229 3outeille closed 3 weeks ago
0
change naonset args definition to make it compatible with the parser

#228 eliebak closed 3 weeks ago
1
delete me

#227 ischlag closed 3 weeks ago
0
[feature] Use Unified Sequence Parallel (USP) instead of Ring attention

#226 feifeibear opened 3 weeks ago
0
Use different positional embeddings

#225 aflah02 opened 1 month ago
2
log "No checkpoint path provided" only on rank 0

#224 eliebak closed 1 month ago
0
remove torch compile

#223 eliebak closed 1 month ago
0
lighteval support after checkpoint, UX refactor

#222 eliebak opened 1 month ago
4
error resuming from checkpoint if PP > 1

#221 moussaKam opened 1 month ago
2
add support for s3 checkpoint

#220 eliebak closed 2 weeks ago
1
Refactor pre tokenization tool

#219 eliebak opened 1 month ago
0
`RuntimeError: First class dim doesn't work with python 3.12` on minimal exmaple using 3.12 conda env

#217 wilzh40 closed 1 month ago
2
Hello, team

#216 barneylogo closed 1 month ago
1
Hello, team

#215 barneylogo closed 1 month ago
0
Match Transformers RoPE implementation

#214 zzhhjjj closed 3 weeks ago
3
Circular import

#213 Na00s closed 1 month ago
8
Pr/tj solergibert/189

#212 xrsrke closed 1 month ago
0
Inquiry about MLflow Logging Integration

#211 justHungryMan closed 1 month ago
1
Will audio input training be supported in the future?

#210 zira-wang opened 2 months ago
0
multi-node pp hang when enable gradient accumulation

#209 yuuxiaooqingg opened 2 months ago
4
Memory optimization in async tp-linear

#208 AleHD closed 1 month ago
1
Add layer-wise activation recomputation to llama model

#207 C-TC closed 2 months ago
1
Update README.md

#206 xrsrke closed 2 months ago
0
Update README.md

#205 xrsrke closed 2 months ago
0
NCCL collective operation timeout

#204 heya5 closed 2 months ago
1
Fix tp mem cache

#203 AleHD closed 1 month ago
9
refacto generate + use simpler rotary for inference

#202 3outeille opened 3 months ago
0
Request for detailed FineWeb-ablation-models training strategy & hyperparams

#201 JefferyChen453 closed 1 month ago
3
Created interconnect benchmark before the training

#200 RamenBuddha opened 3 months ago
0
Refacto generate

#199 3outeille closed 3 months ago
1
[Bug] Missing `_is_using_mup` when resume checkpoint

#198 xrsrke opened 3 months ago
0
Training on AMD/TPU GPUs

#197 jinsong-mao closed 1 month ago
1
how to run benchmark tests

#196 jinsong-mao closed 3 months ago
0
Fineweb Configuration

#195 nezhazheng closed 1 month ago
1
Fix: Update wrong typing on the function get_local_ranks

#194 morgangiraud opened 3 months ago
1
feat(ci): add trufflehog secrets detection

#193 McPatate closed 3 months ago
1
Move MoE Implementation into src/, add Load Balancing Losses

#192 haeggee opened 3 months ago
0
Circular import

#191 xcvil closed 1 month ago
2
Add utility to preview samples used for training. See https://github.com/huggingface/nanotron/issues/184.

#190 kylematoba opened 3 months ago
0
Supporting datatrove tokenized documents with Nanosets

#189 TJ-Solergibert closed 1 month ago
2
add rope_theta to hf conversion script

#188 jquesnelle closed 4 months ago
0
Adding support for training chat models

#187 TJ-Solergibert closed 1 month ago
1
Migrate internal brrr to nanotron

#186 NouamaneTazi opened 4 months ago
0
Added 1-sqrt function for cooldown phase

#185 eliebak closed 4 months ago
0
Add Debug utility to be able to preview first samples used for training

#184 NouamaneTazi opened 4 months ago
0