issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Docs] Add basic debugging docs
#133
NouamaneTazi
closed
5 months ago
0
[Bug] Fix data stages
#132
xrsrke
closed
5 months ago
0
[Feature] Lighteval to Wandb
#131
xrsrke
closed
5 months ago
0
Fix converter
#130
AleHD
closed
5 months ago
1
example bug fix
#129
zzhhjjj
closed
5 months ago
0
[Bug] Fix DoReMi's DRO Loss
#128
xrsrke
closed
5 months ago
0
`nanotron/the-pile-for-doremi` is empty
#127
Qinghao-Hu
closed
5 months ago
1
Question concerning context parallelism.
#126
veritas9872
closed
5 months ago
1
nanotron <-> conversion for Llama resolve #124
#125
yardenas
closed
4 months ago
15
[Feature] nanotron <-> conversion for Llama
#124
yardenas
closed
4 months ago
0
[Feature] Spectral µTransfer
#123
xrsrke
closed
5 months ago
3
Fix test_clip_grads_with_tp
#122
TJ-Solergibert
closed
5 months ago
0
Example code does not work.
#121
codingchild2424
closed
6 months ago
4
[Bug] Add data_stages to the config generation scripts of Llama, MoE and Mamba
#120
xrsrke
closed
5 months ago
0
[Bug] Fix DoReMi's DRO Loss
#119
xrsrke
closed
5 months ago
0
Add release CI
#118
NouamaneTazi
closed
6 months ago
0
fix fan_in computation for bias in mamba
#117
3outeille
closed
6 months ago
0
[Feature] Use `uv` instead of `pip` in CI/CD
#116
xrsrke
opened
6 months ago
1
Multinode minimal example
#115
staghado
closed
5 months ago
6
FEAT: Support 1.58-bit LLMs training
#114
younesbelkada
opened
6 months ago
1
[Feature] Add loading different datasets based on training stages
#113
xrsrke
closed
6 months ago
0
Mamba dependecies
#112
staghado
closed
4 months ago
2
Add FLOPs equations for Mamba and fix number of parameters
#111
staghado
closed
6 months ago
1
add contributor guide
#110
3outeille
closed
6 months ago
0
bugfix bad condition
#109
jordane95
closed
6 months ago
1
PP allocation issue
#108
jordane95
closed
6 months ago
3
Adapt topology-agnostic optimizer shard loading to MoE (fixes #106)
#107
nopperl
opened
6 months ago
4
Resume training with a different Tensor parallel value
#106
3outeille
closed
4 months ago
1
enlarge vocab size to avoid triton error
#105
3outeille
closed
6 months ago
0
Integrating ScatterMoE
#104
shawntan
opened
6 months ago
0
Add converter (HF <-> Nanotron) for Mamba
#103
3outeille
closed
4 months ago
2
Adding memmap input data pipelines
#102
TJ-Solergibert
closed
5 months ago
11
AssertionError related to tied parameters during `train_tiny_llama.sh` execution
#101
xffxff
closed
6 months ago
2
Fix some bugs
#100
jordane95
closed
6 months ago
1
[Features] support gradient checkpointing for memory saving
#99
zguo0525
opened
6 months ago
1
[Refactor] Refactoring Expert Parallelism
#98
NouamaneTazi
closed
6 months ago
1
[Quick fix] fix circular import in logging
#97
NouamaneTazi
closed
6 months ago
0
Bump v0.4 + Quick refactos
#96
NouamaneTazi
closed
6 months ago
0
[DoReMi] Small refactors
#95
xrsrke
closed
6 months ago
0
[Feature] Refactor ParallelContext.world_rank_matrix
#94
0xkerem
closed
6 months ago
7
[Docs] Add unit tests as a requirement
#93
xrsrke
closed
6 months ago
1
[Bug] Fix clipping gradients's test
#92
xrsrke
closed
5 months ago
0
[Feature] All GPUs within the same TP group load training data from shared memory
#91
xrsrke
opened
7 months ago
0
[Unit Test] Add unit tests for DistributedTrainer
#90
xrsrke
opened
7 months ago
5
[Unit Test] Add unit test for DoReMi's trainer
#89
xrsrke
opened
7 months ago
0
[Feature] Use CUDA event for measuring elasped time
#88
xrsrke
opened
7 months ago
0
[Feature] Asyncronous Serialization
#87
xrsrke
opened
7 months ago
0
[Feature] Kernel Fusion of Layer Norm and GeLU
#86
xrsrke
opened
7 months ago
0
[Feature] LAMB optimizer
#85
xrsrke
opened
7 months ago
0
[Feature] Parallel transformer block
#84
xrsrke
opened
7 months ago
0
Previous
Next