huggingface nanotron issues

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.14k stars 107 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Feature] Monitor model states during training

#183 xrsrke opened 4 months ago
0
Fix overflow in nanosets with big datasets

#182 jquesnelle opened 4 months ago
0
Ring attention

#181 zzhhjjj opened 4 months ago
0
FEAT: Adding 1.58bit LLMs training architecture in nanotron

#180 MekkCyber opened 4 months ago
2
Fixes : https://github.com/huggingface/nanotron/issues/114

#179 MekkCyber closed 4 months ago
0
Fixes : https://github.com/huggingface/nanotron/issues/114

#178 MekkCyber closed 4 months ago
0
PyTorch profiler is unable to serialize numpy datatypes sometimes inserted as process group ranks

#177 hatanp opened 4 months ago
0
Where is the "nanotron format" defined?

#176 RonanKMcGovern closed 4 months ago
2
"datatrove" is missing from the examples folder

#175 RonanKMcGovern closed 1 month ago
5
Llama3 conversion scripts 🦙

#174 TJ-Solergibert opened 4 months ago
6
add rope_theta config var for llama

#173 jquesnelle closed 4 months ago
1
Fix _RowLinearAsyncCommunication

#172 C-TC closed 2 months ago
1
[Feature] Mixture of Depths

#171 xrsrke opened 4 months ago
0
Fixed FA2 test

#170 TJ-Solergibert closed 4 months ago
0
[Feature] Infini Attention

#169 xrsrke opened 4 months ago
0
Core attention

#168 zzhhjjj opened 4 months ago
0
`FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'`

#167 NouamaneTazi closed 4 months ago
1
README typo, specifies a .sh as a config

#166 staghado closed 4 months ago
1
Adding checkpoint after traning ends

#165 angegonzalez closed 1 month ago
0
Readme revamp

#164 NouamaneTazi closed 4 months ago
0
We don't save checkpoint after training ends

#163 NouamaneTazi opened 4 months ago
0
Support custom dataloader

#162 NouamaneTazi closed 4 months ago
0
out of memory for continuing pretraining llama3-8B

#161 ckzbullbullet opened 4 months ago
5
Enable masking when tp=1

#160 YongjunHe opened 4 months ago
0
moe in src and load balancing losses

#159 haeggee opened 4 months ago
2
Train more than 1 epoch?

#158 Lauler closed 4 months ago
5
llama tests

#157 zzhhjjj opened 5 months ago
1
Fix TestContext warning

#156 AleHD opened 5 months ago
0
Adding Nanoset dataset

#155 TJ-Solergibert closed 4 months ago
2
Add data loading time in log

#154 XinDongol opened 5 months ago
0
Make Pipeline Parallelism Optional

#153 XinDongol closed 1 month ago
1
Checkpoint 1.3 backwards compatibility

#152 AleHD opened 5 months ago
2
Script to fix duplicated ".safetensors" in checkpoints naming

#151 NouamaneTazi closed 5 months ago
1
num_samples

#150 zzhhjjj closed 5 months ago
0
[BUG] fix arg for save_checkpoint

#149 3outeille closed 5 months ago
0
[Bug] Fix missing `.get_named_params_without_weight_decay()` in llama

#148 xrsrke closed 5 months ago
1
[Feature] Infini Attention

#147 xrsrke closed 4 months ago
0
'LlamaModel' object has no attribute 'get_named_params_without_weight_decay' in the beginner example

#146 XinDongol closed 5 months ago
3
readme

#145 zzhhjjj closed 2 months ago
0
[Bug] Resuming training for data stages

#144 xrsrke closed 5 months ago
0
Use CUDA Events for measuring elapsed time

#143 staghado opened 5 months ago
2
Haojun/inference

#142 zzhhjjj opened 5 months ago
0
Resume training from data stages

#141 3outeille closed 5 months ago
0
[Bug] Remove printing of HF dataset in data stages

#140 xrsrke closed 5 months ago
0
Add param group weight decay

#139 3outeille closed 5 months ago
0
TritonRMSNorm generates randomized results during inference

#138 zzhhjjj closed 5 months ago
0
make mamba config works with data stages

#137 3outeille closed 5 months ago
0
add inference for mamba

#136 3outeille closed 5 months ago
1
small fix mamba

#135 3outeille closed 5 months ago
1
minor : number of gpu:s per node is not always 8

#134 staghado closed 5 months ago
0

Previous Next