karpathy llm.c issues - Githubissues

karpathy / llm.c

LLM training in simple, raw C/CUDA

MIT License

21.23k stars 2.3k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

hacky block-level stable adamw

#655 ngc92 opened 15 hours ago
0
Set RNG seed manually with '-rg' parameter

#654 ademeure opened 1 day ago
0
Matmul refactor using only cuBLASLt + GELU Fusion

#653 ademeure opened 1 day ago
0
Make cuDNN deterministic for Flash Attention backward

#652 ademeure closed 1 day ago
0
Async optimizer state and model checkpointing

#651 chinthysl opened 1 day ago
0
muP (maximum update parametrization) [WIP]

#650 gordicaleksa opened 2 days ago
0
tune openmpi make a bit more convenient

#649 karpathy closed 2 days ago
0
[readme] mention opencl fork

#648 krrishnarraj closed 2 days ago
0
Opencl

#647 krrishnarraj closed 2 days ago
1
Makefile: Respect the NO_MULTI_GPU

#646 Ricardicus closed 2 days ago
0
Improve kernel in layerNorm forward: adapt variance estimation method from kernel 4 for use in kernel 6

#645 awayzjj opened 3 days ago
1
Mixed dtypes

#644 ngc92 closed 1 day ago
0
dev/download_starter_pack.sh: adding SIGINT trap and current download…

#643 Ricardicus opened 3 days ago
2
Windows issue with Cuda Toolkit 12.5 and latest MSVC compiler 17.10

#642 rosslwheeler opened 3 days ago
0
Add check versions of functions

#641 gordicaleksa opened 3 days ago
0
Add missing MULTI_GPU compiler flag

#640 chinthysl closed 3 days ago
0
Add unistd.h to fix Windows cuDNN build

#639 rosslwheeler closed 3 days ago
0
add http status check for download_file

#638 BilyZ98 opened 4 days ago
1
add outlier detector, test for it, and start tracking z score of loss

#637 karpathy closed 3 days ago
1
rolling checkpoints

#636 karpathy closed 3 days ago
0
On-device reductions

#635 ngc92 closed 3 days ago
0
Optionally specify root location of CUDNN

#634 koparasy opened 4 days ago
0
Socket server/client interface

#633 chinthysl closed 4 days ago
0
MPI/TCP/FS for NCCL-init

#632 gordicaleksa closed 4 days ago
3
Minor refactor

#631 gordicaleksa closed 4 days ago
0
Speedup CPU training by 10% using Memory Aligned Tensors

#630 iVishalr opened 6 days ago
0
CI Dataloader test and ptx/sass file generator

#629 rosslwheeler opened 1 week ago
0
Check for the existence of CUDNN_FRONTEND_PATH before looking in default directories. If it exists, skip additional checks.

#628 koparasy opened 1 week ago
7
feature/lr_schedulers

#627 karpathy closed 1 week ago
0
Add explicit HuggingFace cache dir

#626 gordicaleksa opened 1 week ago
2
Add NCCL instruction to README

#625 gordicaleksa closed 1 week ago
0
if available, use MPI env vars to initialize multi-gpu configs

#624 ngc92 closed 1 week ago
0
feature/nccl only (delete MPI)

#623 karpathy closed 4 days ago
3
sel4 + llm.c > path to putting these llms in any mission critical system

#622 torrmal opened 1 week ago
0
Added A10 to mfu.h

#621 tiehexue opened 1 week ago
1
small fixes based on clang-tidy

#620 ngc92 opened 1 week ago
3
Cast Get2dNoiseUint computation to uint

#619 gordicaleksa closed 4 days ago
3
WIP Distribution Visualisation to help with FP8 work & beyond

#618 ademeure opened 1 week ago
0
WIP Distribution Visualisation to help with FP8 work & beyond

#617 ademeure closed 1 week ago
0
7-8% speedup: optimize matmul_backward_bias_kernel, reduce cast ops, improve loop unrolling, direct var use

#616 bgorlick opened 1 week ago
2
Relax grad tensor thresholds in tests

#615 gordicaleksa closed 1 week ago
1
Stricter FP32 tests

#614 gordicaleksa closed 1 week ago
2
Fix dataloader & determinism testing

#613 gordicaleksa closed 1 week ago
4
Bug fixes and warning fixes for test_dataloader.c

#611 rosslwheeler closed 1 week ago
0
gpt2_forward adding CUDA streams with events for async layered operations, cache prefetching for efficient data access with high temporal locality

#610 bgorlick opened 1 week ago
0
Enhance gradient norm calc in gpt2_update: reuse variables, clarify first pass logic, improve condition handling

#609 bgorlick opened 1 week ago
0
Clarified installing CUDNN front-end headers and arch linux install instructs

#608 bgorlick opened 1 week ago
1
Llama RoPE Forward Kernels

#607 AndreSlavescu opened 1 week ago
0
Minor refactor of dataloader

#606 gordicaleksa closed 1 week ago
0
Add learning rate schedulers

#605 gordicaleksa closed 1 week ago
9