karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
34.49k stars 5.31k forks source link

Running train.py on 2060 GPU #8

Open lzeladam opened 1 year ago

lzeladam commented 1 year ago

"Hello! I've been trying to run the train.py on a 2060 GPU, but this device does not support dtype=torch.bfloat16. What changes would I have to make to achieve my goal? Or can I only train on an Ampere architecture GPU for now? Thank you very much for sharing this project!"

karpathy commented 1 year ago

Two options:

lzeladam commented 1 year ago

H @karpathy,

Thank you for your help, I made the change and now I have some problems detecting the CUDA in my WSL environment:

debug_wrapper raised RuntimeError: CUDA: Error- no device

I don't know why because the GPU is detected with nvidia-smi command:

image

so, I will try to solve it

jcherrera commented 1 year ago

What are the min requirements to run nanoGPT?

lzeladam commented 1 year ago

@jcherrera try to change this parameters batch_size = 12 by 16 Block_size = 1024 by 512

Note: This project doesn't work in windows because pythorch 2.0 by now only support Linux. Another alternative is pay a A10 or A100 instance in Lambdlabs.com ...maybe I'll could do a post 🤔

adammarples commented 1 year ago

@jcherrera set

compile = False # use PyTorch 2.0 to compile the model to be faster

in train.py

jorahn commented 1 year ago

To add one data point: I'm running unmodified python train.py with --batch_size=8 on ~22gb vram.