issues
search
karpathy
/
nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
32.39k
stars
4.92k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Hyperparameter Tuning
#484
SinanCavusoglu
opened
9 hours ago
0
Index out of range when training on custom dataset
#483
TayTT
opened
2 days ago
0
What is the meaning of nh and hs
#482
Bachstelze
opened
3 days ago
1
Fix: conditional use of GradScaler based on device_type and dtype in train.py
#481
BRAINIAC2677
opened
1 week ago
0
neverMind
#480
Zemulax
closed
1 week ago
0
Implement multi-token prediction option for models
#479
tmostak
opened
2 weeks ago
1
nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did?
#478
wmx-github
opened
3 weeks ago
1
Training fails on Python 3.12 on either GPU or CPU
#477
tigran123
closed
3 weeks ago
3
Recommendation for something smaller
#476
diamondfishtools
opened
3 weeks ago
0
[Question] Why use `__call__` to do forward.
#475
Felix-Zhenghao
closed
2 weeks ago
2
could nanoGPT be the AI assistant for the development of CAX software?
#474
fengsim
opened
1 month ago
1
[Question] The mask size seems wrong?
#473
Felix-Zhenghao
closed
1 month ago
0
[Question] why bias is init to zero?
#472
michael8090
opened
1 month ago
0
Citing this project in research
#471
davmacario
opened
1 month ago
0
CUDA error: device-side assert triggered
#470
ecsfu
closed
1 month ago
0
How to Set "vocab_size" and "block_size" for Word Embedding?
#469
haibao-yu
opened
1 month ago
1
Is this loss curve normal
#468
banyan-god
opened
1 month ago
15
Resume Training
#467
tiredsoul21
opened
1 month ago
3
MFU too low in custom GPT-2 training
#466
eonurk
closed
1 month ago
2
nano_gpt
#465
Mihir0567
opened
1 month ago
0
fix: h100-mfu-calculation
#464
OrenLeung
opened
1 month ago
0
Fixing eval path in README
#463
goswamig
opened
1 month ago
0
gabe init
#462
jondestoppeleire
closed
1 month ago
0
Training loss converges much earlier compared to max_iters
#461
goswamig
opened
2 months ago
1
no cuda training does not work.
#460
BurkenDev
opened
2 months ago
1
Refactor for easier configuration and overrides
#459
ikeman32
opened
2 months ago
0
Torch >= 2.2.0 inference issues on MPS
#458
davmacario
opened
2 months ago
3
Why do we need further pretrain given the loss is already converged
#457
BiEchi
opened
2 months ago
1
MFU calculation wrong
#456
lxww302
opened
2 months ago
0
dropout is 0.0
#455
dipsivenkatesh
opened
2 months ago
3
PyTorch nn.LayerNorm now takes bias arg - removed custom class
#454
calmitchell617
opened
2 months ago
1
Early stopping
#453
derekehyatt
opened
2 months ago
1
Optimizer type comparisons with grid search
#452
rjbaw
closed
2 months ago
1
Why is there no mask when using flash attention?
#451
bruce2233
closed
2 months ago
2
Implement ROPE positional encodings
#450
devinbot
opened
2 months ago
1
Implement ROPE positional encodings and adjust training parameters
#449
devinbot
closed
2 months ago
0
Would like to contribute FSDP functionality
#448
calmitchell617
opened
2 months ago
2
Why don't we crop attn.weight as well?
#447
muerghq
opened
2 months ago
1
fix: estimate_mfu dt ZeroDivisionError
#446
HildaM
opened
2 months ago
0
Question about causal masking vs full-context auto-regressive masking
#445
pi-tau
opened
2 months ago
0
Custom mha
#444
MarcoMueglich
closed
2 months ago
0
get_lr needs to handle iter_num initialized to 0
#443
yiphei
opened
2 months ago
0
Sample from a subset of the token_embedding_table
#442
PLarsen79
opened
2 months ago
3
Character level: Enwik8
#441
howaboutyu
closed
2 months ago
0
nothing has been written into???
#440
BeimingCharles
opened
2 months ago
1
AssertionError when trying to run sample.py
#439
RexNecross
opened
3 months ago
1
Which Python version can be used
#438
denghuilong-sir
opened
3 months ago
1
To reduce GPU memory usage & found a bug
#436
cooper-him
opened
3 months ago
3
How to train nanoGPT using TPU's?
#435
kathir-ks
opened
3 months ago
1
i am getting encoding errors when i run the sample.py with any start contexts
#434
danyuexiao
opened
3 months ago
1
Next