issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add Mamba PR
#83
3outeille
closed
6 months ago
1
[Bug] Not saving `lm_head` in checkpoint
#82
xrsrke
closed
7 months ago
0
[Fix] Assert the wrong tolerance of FA2's Layer Norm kernel
#81
xrsrke
closed
7 months ago
0
[Feature] Add loading different datasets based on training stages
#80
xrsrke
closed
6 months ago
0
Continued Pretraining on Llama 7b.
#79
wiseyy
opened
7 months ago
8
Continued Pretraining on Llama7b.
#78
wiseyy
closed
7 months ago
1
[Feature] Refactor `ParallelContext.world_rank_matrix`
#77
NouamaneTazi
opened
7 months ago
0
Deprecate `recompute_granularity` in config
#76
NouamaneTazi
closed
7 months ago
0
Refactor dMoE
#75
NouamaneTazi
closed
7 months ago
0
[Feature] Fix support for sequence parallelism with MoEs
#74
NouamaneTazi
opened
7 months ago
0
Add MoEs support
#73
NouamaneTazi
closed
7 months ago
0
Support Expert Parallelism
#72
NouamaneTazi
closed
7 months ago
0
Implement pipeline parallel size-agnostic optimizer state loading
#71
nopperl
closed
7 months ago
0
[FP8 Training] End-to-end FP8 Training
#70
xrsrke
opened
7 months ago
3
Refactor `ParallelContext` and some process groups creation
#69
NouamaneTazi
closed
7 months ago
0
Fix topology agnostic loading
#68
nopperl
closed
7 months ago
1
fix configs
#67
NouamaneTazi
closed
7 months ago
0
quick fix train steps assertion
#66
NouamaneTazi
closed
7 months ago
1
quick fix train steps assertion
#65
NouamaneTazi
closed
7 months ago
0
Update bench script
#64
NouamaneTazi
closed
7 months ago
0
[`Docs`] Fix typos
#63
tolgacangoz
closed
7 months ago
0
Refactoring tying mechanism + small fixes
#62
NouamaneTazi
closed
7 months ago
0
[Feature request] Performance and accuracy benchmarks
#61
brianyu-nexusflowai
opened
7 months ago
2
Lighteval naming
#60
thomwolf
closed
7 months ago
0
Making lr schedule more flexible
#59
thomwolf
closed
7 months ago
0
Adding eval config
#58
thomwolf
closed
7 months ago
0
[Question] Modification for Performing Fine-Tuning
#57
allanj
opened
7 months ago
3
[FP8 Training] A single forward and backward pass for a linear in FP8
#56
xrsrke
closed
7 months ago
0
random fixes
#55
NouamaneTazi
closed
7 months ago
0
[Feature] DoReMi
#54
xrsrke
closed
7 months ago
1
Make custom models a bit easier to use
#53
thomwolf
closed
8 months ago
0
Make tests pass
#52
NouamaneTazi
closed
8 months ago
0
Fix support for custom modeling
#51
NouamaneTazi
closed
8 months ago
1
Add Mamba
#50
3outeille
closed
7 months ago
0
Some quick fixes
#49
NouamaneTazi
closed
8 months ago
1
[Question] Async Tensor Parallel
#48
woshiyyya
opened
8 months ago
2
Integration with the HuggingFace Ecosystem
#47
woshiyyya
closed
8 months ago
1
[Question] Correctness of backward pass of RowLinear
#46
ufotalent
opened
8 months ago
2
[Feature Request] Support Data Streaming for faster training of large models
#45
chagri
opened
8 months ago
2
Add benchmarking script for Llama-2-7b model
#44
NouamaneTazi
closed
8 months ago
0
[Feature Request] Add simple communications benchmarks to the repo
#43
NouamaneTazi
opened
8 months ago
1
Small tweaks to make nanotron a bit more flexible
#42
thomwolf
closed
8 months ago
0
Add CI/CD for unit tests
#41
xrsrke
closed
7 months ago
4
Bump flash-attn to 2.5.0
#40
NouamaneTazi
closed
8 months ago
0
Fixing typos in docs
#39
andylolu2
closed
8 months ago
1
Merging optimizer states from different pipeline parallel size to resume training
#38
xrsrke
closed
7 months ago
0
[Bug] Fix Zero-0 Optimizer States Not in Sync When Merging from Different Topologies
#37
xrsrke
closed
8 months ago
0
How is it compared with Megatron Deepspeed?
#36
allanj
closed
8 months ago
1
[Bug] `TypeError: Config.__init__() [...]` from `examples/config_tiny_llama.py`
#35
saforem2
closed
8 months ago
3
[Feature] DoReMi
#34
xrsrke
closed
8 months ago
3
Previous
Next