karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
21.28k stars 2.31k forks source link

dev/download_starter_pack.sh: adding SIGINT trap and current download… #643

Open Ricardicus opened 5 days ago

Ricardicus commented 5 days ago

The download_starter_pack.sh felt like it got stuck and I did not know why. It only displays the files that got downloaded and not the ones in progress. I also couldn't stop it with CTRL-C properly.

This adds:

It can look like this (when three files have been downloaded, and there are three left to go)

Downloading gpt2_124M.bin...
Downloading gpt2_124M_bf16.bin...
Downloading gpt2_124M_debug_state.bin...
Downloaded gpt2_tokenizer.bin to /Users/user/llm.c/dev/../gpt2_tokenizer.bin   
Downloaded tiny_shakespeare_train.bin to /Users/user/llm.c/dev/data/tinyshakespeare/tiny_shakespeare_train.bin   
Downloaded tiny_shakespeare_val.bin to /Users/user/llm.c/dev/data/tinyshakespeare/tiny_shakespeare_val.bin   
Ricardicus commented 4 days ago

I think this looks pretty good. I removed the clear screen at the beginning now.

Is it a little bit over-engineered? yes. But does it look good? yes.

Ricardicus commented 4 days ago

I also added a little sanity check for the "curl" requirement. I ran this script in the Docker image "nvidia/cuda:12.4.1-devel-ubuntu22.04" from Dockerhub and apparently it does not come shipped with curl so the script failed and the output looked messy but it looks better now.