issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Feature] Monitor model states during training
#183
xrsrke
opened
4 months ago
0
Fix overflow in nanosets with big datasets
#182
jquesnelle
opened
4 months ago
0
Ring attention
#181
zzhhjjj
opened
4 months ago
0
FEAT: Adding 1.58bit LLMs training architecture in nanotron
#180
MekkCyber
opened
4 months ago
2
Fixes : https://github.com/huggingface/nanotron/issues/114
#179
MekkCyber
closed
4 months ago
0
Fixes : https://github.com/huggingface/nanotron/issues/114
#178
MekkCyber
closed
4 months ago
0
PyTorch profiler is unable to serialize numpy datatypes sometimes inserted as process group ranks
#177
hatanp
opened
4 months ago
0
Where is the "nanotron format" defined?
#176
RonanKMcGovern
closed
4 months ago
2
"datatrove" is missing from the examples folder
#175
RonanKMcGovern
closed
1 month ago
5
Llama3 conversion scripts 🦙
#174
TJ-Solergibert
opened
4 months ago
6
add rope_theta config var for llama
#173
jquesnelle
closed
4 months ago
1
Fix _RowLinearAsyncCommunication
#172
C-TC
closed
2 months ago
1
[Feature] Mixture of Depths
#171
xrsrke
opened
4 months ago
0
Fixed FA2 test
#170
TJ-Solergibert
closed
4 months ago
0
[Feature] Infini Attention
#169
xrsrke
opened
4 months ago
0
Core attention
#168
zzhhjjj
opened
4 months ago
0
`FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'`
#167
NouamaneTazi
closed
4 months ago
1
README typo, specifies a .sh as a config
#166
staghado
closed
4 months ago
1
Adding checkpoint after traning ends
#165
angegonzalez
closed
1 month ago
0
Readme revamp
#164
NouamaneTazi
closed
4 months ago
0
We don't save checkpoint after training ends
#163
NouamaneTazi
opened
4 months ago
0
Support custom dataloader
#162
NouamaneTazi
closed
4 months ago
0
out of memory for continuing pretraining llama3-8B
#161
ckzbullbullet
opened
4 months ago
5
Enable masking when tp=1
#160
YongjunHe
opened
4 months ago
0
moe in src and load balancing losses
#159
haeggee
opened
4 months ago
2
Train more than 1 epoch?
#158
Lauler
closed
4 months ago
5
llama tests
#157
zzhhjjj
opened
5 months ago
1
Fix TestContext warning
#156
AleHD
opened
5 months ago
0
Adding Nanoset dataset
#155
TJ-Solergibert
closed
4 months ago
2
Add data loading time in log
#154
XinDongol
opened
5 months ago
0
Make Pipeline Parallelism Optional
#153
XinDongol
closed
1 month ago
1
Checkpoint 1.3 backwards compatibility
#152
AleHD
opened
5 months ago
2
Script to fix duplicated ".safetensors" in checkpoints naming
#151
NouamaneTazi
closed
5 months ago
1
num_samples
#150
zzhhjjj
closed
5 months ago
0
[BUG] fix arg for save_checkpoint
#149
3outeille
closed
5 months ago
0
[Bug] Fix missing `.get_named_params_without_weight_decay()` in llama
#148
xrsrke
closed
5 months ago
1
[Feature] Infini Attention
#147
xrsrke
closed
4 months ago
0
'LlamaModel' object has no attribute 'get_named_params_without_weight_decay' in the beginner example
#146
XinDongol
closed
5 months ago
3
readme
#145
zzhhjjj
closed
2 months ago
0
[Bug] Resuming training for data stages
#144
xrsrke
closed
5 months ago
0
Use CUDA Events for measuring elapsed time
#143
staghado
opened
5 months ago
2
Haojun/inference
#142
zzhhjjj
opened
5 months ago
0
Resume training from data stages
#141
3outeille
closed
5 months ago
0
[Bug] Remove printing of HF dataset in data stages
#140
xrsrke
closed
5 months ago
0
Add param group weight decay
#139
3outeille
closed
5 months ago
0
TritonRMSNorm generates randomized results during inference
#138
zzhhjjj
closed
5 months ago
0
make mamba config works with data stages
#137
3outeille
closed
5 months ago
0
add inference for mamba
#136
3outeille
closed
5 months ago
1
small fix mamba
#135
3outeille
closed
5 months ago
1
minor : number of gpu:s per node is not always 8
#134
staghado
closed
5 months ago
0
Previous
Next