huggingface nanotron issues

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.14k stars 107 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add Mamba PR

#83 3outeille closed 6 months ago
1
[Bug] Not saving `lm_head` in checkpoint

#82 xrsrke closed 7 months ago
0
[Fix] Assert the wrong tolerance of FA2's Layer Norm kernel

#81 xrsrke closed 7 months ago
0
[Feature] Add loading different datasets based on training stages

#80 xrsrke closed 6 months ago
0
Continued Pretraining on Llama 7b.

#79 wiseyy opened 7 months ago
8
Continued Pretraining on Llama7b.

#78 wiseyy closed 7 months ago
1
[Feature] Refactor `ParallelContext.world_rank_matrix`

#77 NouamaneTazi opened 7 months ago
0
Deprecate `recompute_granularity` in config

#76 NouamaneTazi closed 7 months ago
0
Refactor dMoE

#75 NouamaneTazi closed 7 months ago
0
[Feature] Fix support for sequence parallelism with MoEs

#74 NouamaneTazi opened 7 months ago
0
Add MoEs support

#73 NouamaneTazi closed 7 months ago
0
Support Expert Parallelism

#72 NouamaneTazi closed 7 months ago
0
Implement pipeline parallel size-agnostic optimizer state loading

#71 nopperl closed 7 months ago
0
[FP8 Training] End-to-end FP8 Training

#70 xrsrke opened 7 months ago
3
Refactor `ParallelContext` and some process groups creation

#69 NouamaneTazi closed 7 months ago
0
Fix topology agnostic loading

#68 nopperl closed 7 months ago
1
fix configs

#67 NouamaneTazi closed 7 months ago
0
quick fix train steps assertion

#66 NouamaneTazi closed 7 months ago
1
quick fix train steps assertion

#65 NouamaneTazi closed 7 months ago
0
Update bench script

#64 NouamaneTazi closed 7 months ago
0
[`Docs`] Fix typos

#63 tolgacangoz closed 7 months ago
0
Refactoring tying mechanism + small fixes

#62 NouamaneTazi closed 7 months ago
0
[Feature request] Performance and accuracy benchmarks

#61 brianyu-nexusflowai opened 7 months ago
2
Lighteval naming

#60 thomwolf closed 7 months ago
0
Making lr schedule more flexible

#59 thomwolf closed 7 months ago
0
Adding eval config

#58 thomwolf closed 7 months ago
0
[Question] Modification for Performing Fine-Tuning

#57 allanj opened 7 months ago
3
[FP8 Training] A single forward and backward pass for a linear in FP8

#56 xrsrke closed 7 months ago
0
random fixes

#55 NouamaneTazi closed 7 months ago
0
[Feature] DoReMi

#54 xrsrke closed 7 months ago
1
Make custom models a bit easier to use

#53 thomwolf closed 8 months ago
0
Make tests pass

#52 NouamaneTazi closed 8 months ago
0
Fix support for custom modeling

#51 NouamaneTazi closed 8 months ago
1
Add Mamba

#50 3outeille closed 7 months ago
0
Some quick fixes

#49 NouamaneTazi closed 8 months ago
1
[Question] Async Tensor Parallel

#48 woshiyyya opened 8 months ago
2
Integration with the HuggingFace Ecosystem

#47 woshiyyya closed 8 months ago
1
[Question] Correctness of backward pass of RowLinear

#46 ufotalent opened 8 months ago
2
[Feature Request] Support Data Streaming for faster training of large models

#45 chagri opened 8 months ago
2
Add benchmarking script for Llama-2-7b model

#44 NouamaneTazi closed 8 months ago
0
[Feature Request] Add simple communications benchmarks to the repo

#43 NouamaneTazi opened 8 months ago
1
Small tweaks to make nanotron a bit more flexible

#42 thomwolf closed 8 months ago
0
Add CI/CD for unit tests

#41 xrsrke closed 7 months ago
4
Bump flash-attn to 2.5.0

#40 NouamaneTazi closed 8 months ago
0
Fixing typos in docs

#39 andylolu2 closed 8 months ago
1
Merging optimizer states from different pipeline parallel size to resume training

#38 xrsrke closed 7 months ago
0
[Bug] Fix Zero-0 Optimizer States Not in Sync When Merging from Different Topologies

#37 xrsrke closed 8 months ago
0
How is it compared with Megatron Deepspeed?

#36 allanj closed 8 months ago
1
[Bug] `TypeError: Config.__init__() [...]` from `examples/config_tiny_llama.py`

#35 saforem2 closed 8 months ago
3
[Feature] DoReMi

#34 xrsrke closed 8 months ago
3

Previous Next