huggingface nanotron issues

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.14k stars 107 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Docs] Add basic debugging docs

#133 NouamaneTazi closed 5 months ago
0
[Bug] Fix data stages

#132 xrsrke closed 5 months ago
0
[Feature] Lighteval to Wandb

#131 xrsrke closed 5 months ago
0
Fix converter

#130 AleHD closed 5 months ago
1
example bug fix

#129 zzhhjjj closed 5 months ago
0
[Bug] Fix DoReMi's DRO Loss

#128 xrsrke closed 5 months ago
0
`nanotron/the-pile-for-doremi` is empty

#127 Qinghao-Hu closed 5 months ago
1
Question concerning context parallelism.

#126 veritas9872 closed 5 months ago
1
nanotron <-> conversion for Llama resolve #124

#125 yardenas closed 4 months ago
15
[Feature] nanotron <-> conversion for Llama

#124 yardenas closed 4 months ago
0
[Feature] Spectral µTransfer

#123 xrsrke closed 5 months ago
3
Fix test_clip_grads_with_tp

#122 TJ-Solergibert closed 5 months ago
0
Example code does not work.

#121 codingchild2424 closed 6 months ago
4
[Bug] Add data_stages to the config generation scripts of Llama, MoE and Mamba

#120 xrsrke closed 5 months ago
0
[Bug] Fix DoReMi's DRO Loss

#119 xrsrke closed 5 months ago
0
Add release CI

#118 NouamaneTazi closed 6 months ago
0
fix fan_in computation for bias in mamba

#117 3outeille closed 6 months ago
0
[Feature] Use `uv` instead of `pip` in CI/CD

#116 xrsrke opened 6 months ago
1
Multinode minimal example

#115 staghado closed 5 months ago
6
FEAT: Support 1.58-bit LLMs training

#114 younesbelkada opened 6 months ago
1
[Feature] Add loading different datasets based on training stages

#113 xrsrke closed 6 months ago
0
Mamba dependecies

#112 staghado closed 4 months ago
2
Add FLOPs equations for Mamba and fix number of parameters

#111 staghado closed 6 months ago
1
add contributor guide

#110 3outeille closed 6 months ago
0
bugfix bad condition

#109 jordane95 closed 6 months ago
1
PP allocation issue

#108 jordane95 closed 6 months ago
3
Adapt topology-agnostic optimizer shard loading to MoE (fixes #106)

#107 nopperl opened 6 months ago
4
Resume training with a different Tensor parallel value

#106 3outeille closed 4 months ago
1
enlarge vocab size to avoid triton error

#105 3outeille closed 6 months ago
0
Integrating ScatterMoE

#104 shawntan opened 6 months ago
0
Add converter (HF <-> Nanotron) for Mamba

#103 3outeille closed 4 months ago
2
Adding memmap input data pipelines

#102 TJ-Solergibert closed 5 months ago
11
AssertionError related to tied parameters during `train_tiny_llama.sh` execution

#101 xffxff closed 6 months ago
2
Fix some bugs

#100 jordane95 closed 6 months ago
1
[Features] support gradient checkpointing for memory saving

#99 zguo0525 opened 6 months ago
1
[Refactor] Refactoring Expert Parallelism

#98 NouamaneTazi closed 6 months ago
1
[Quick fix] fix circular import in logging

#97 NouamaneTazi closed 6 months ago
0
Bump v0.4 + Quick refactos

#96 NouamaneTazi closed 6 months ago
0
[DoReMi] Small refactors

#95 xrsrke closed 6 months ago
0
[Feature] Refactor ParallelContext.world_rank_matrix

#94 0xkerem closed 6 months ago
7
[Docs] Add unit tests as a requirement

#93 xrsrke closed 6 months ago
1
[Bug] Fix clipping gradients's test

#92 xrsrke closed 5 months ago
0
[Feature] All GPUs within the same TP group load training data from shared memory

#91 xrsrke opened 7 months ago
0
[Unit Test] Add unit tests for DistributedTrainer

#90 xrsrke opened 7 months ago
5
[Unit Test] Add unit test for DoReMi's trainer

#89 xrsrke opened 7 months ago
0
[Feature] Use CUDA event for measuring elasped time

#88 xrsrke opened 7 months ago
0
[Feature] Asyncronous Serialization

#87 xrsrke opened 7 months ago
0
[Feature] Kernel Fusion of Layer Norm and GeLU

#86 xrsrke opened 7 months ago
0
[Feature] LAMB optimizer

#85 xrsrke opened 7 months ago
0
[Feature] Parallel transformer block

#84 xrsrke opened 7 months ago
0

Previous Next