issues
search
karpathy
/
llm.c
LLM training in simple, raw C/CUDA
MIT License
21.23k
stars
2.3k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
hacky block-level stable adamw
#655
ngc92
opened
15 hours ago
0
Set RNG seed manually with '-rg' parameter
#654
ademeure
opened
1 day ago
0
Matmul refactor using only cuBLASLt + GELU Fusion
#653
ademeure
opened
1 day ago
0
Make cuDNN deterministic for Flash Attention backward
#652
ademeure
closed
1 day ago
0
Async optimizer state and model checkpointing
#651
chinthysl
opened
1 day ago
0
muP (maximum update parametrization) [WIP]
#650
gordicaleksa
opened
2 days ago
0
tune openmpi make a bit more convenient
#649
karpathy
closed
2 days ago
0
[readme] mention opencl fork
#648
krrishnarraj
closed
2 days ago
0
Opencl
#647
krrishnarraj
closed
2 days ago
1
Makefile: Respect the NO_MULTI_GPU
#646
Ricardicus
closed
2 days ago
0
Improve kernel in layerNorm forward: adapt variance estimation method from kernel 4 for use in kernel 6
#645
awayzjj
opened
3 days ago
1
Mixed dtypes
#644
ngc92
closed
1 day ago
0
dev/download_starter_pack.sh: adding SIGINT trap and current download…
#643
Ricardicus
opened
3 days ago
2
Windows issue with Cuda Toolkit 12.5 and latest MSVC compiler 17.10
#642
rosslwheeler
opened
3 days ago
0
Add check versions of functions
#641
gordicaleksa
opened
3 days ago
0
Add missing MULTI_GPU compiler flag
#640
chinthysl
closed
3 days ago
0
Add unistd.h to fix Windows cuDNN build
#639
rosslwheeler
closed
3 days ago
0
add http status check for download_file
#638
BilyZ98
opened
4 days ago
1
add outlier detector, test for it, and start tracking z score of loss
#637
karpathy
closed
3 days ago
1
rolling checkpoints
#636
karpathy
closed
3 days ago
0
On-device reductions
#635
ngc92
closed
3 days ago
0
Optionally specify root location of CUDNN
#634
koparasy
opened
4 days ago
0
Socket server/client interface
#633
chinthysl
closed
4 days ago
0
MPI/TCP/FS for NCCL-init
#632
gordicaleksa
closed
4 days ago
3
Minor refactor
#631
gordicaleksa
closed
4 days ago
0
Speedup CPU training by 10% using Memory Aligned Tensors
#630
iVishalr
opened
6 days ago
0
CI Dataloader test and ptx/sass file generator
#629
rosslwheeler
opened
1 week ago
0
Check for the existence of CUDNN_FRONTEND_PATH before looking in default directories. If it exists, skip additional checks.
#628
koparasy
opened
1 week ago
7
feature/lr_schedulers
#627
karpathy
closed
1 week ago
0
Add explicit HuggingFace cache dir
#626
gordicaleksa
opened
1 week ago
2
Add NCCL instruction to README
#625
gordicaleksa
closed
1 week ago
0
if available, use MPI env vars to initialize multi-gpu configs
#624
ngc92
closed
1 week ago
0
feature/nccl only (delete MPI)
#623
karpathy
closed
4 days ago
3
sel4 + llm.c > path to putting these llms in any mission critical system
#622
torrmal
opened
1 week ago
0
Added A10 to mfu.h
#621
tiehexue
opened
1 week ago
1
small fixes based on clang-tidy
#620
ngc92
opened
1 week ago
3
Cast Get2dNoiseUint computation to uint
#619
gordicaleksa
closed
4 days ago
3
WIP Distribution Visualisation to help with FP8 work & beyond
#618
ademeure
opened
1 week ago
0
WIP Distribution Visualisation to help with FP8 work & beyond
#617
ademeure
closed
1 week ago
0
7-8% speedup: optimize matmul_backward_bias_kernel, reduce cast ops, improve loop unrolling, direct var use
#616
bgorlick
opened
1 week ago
2
Relax grad tensor thresholds in tests
#615
gordicaleksa
closed
1 week ago
1
Stricter FP32 tests
#614
gordicaleksa
closed
1 week ago
2
Fix dataloader & determinism testing
#613
gordicaleksa
closed
1 week ago
4
Bug fixes and warning fixes for test_dataloader.c
#611
rosslwheeler
closed
1 week ago
0
gpt2_forward adding CUDA streams with events for async layered operations, cache prefetching for efficient data access with high temporal locality
#610
bgorlick
opened
1 week ago
0
Enhance gradient norm calc in gpt2_update: reuse variables, clarify first pass logic, improve condition handling
#609
bgorlick
opened
1 week ago
0
Clarified installing CUDNN front-end headers and arch linux install instructs
#608
bgorlick
opened
1 week ago
1
Llama RoPE Forward Kernels
#607
AndreSlavescu
opened
1 week ago
0
Minor refactor of dataloader
#606
gordicaleksa
closed
1 week ago
0
Add learning rate schedulers
#605
gordicaleksa
closed
1 week ago
9
Next