CUDA out of memory. 4090. train.py

potamides / AutomaTikZ

Text-Guided Synthesis of Scientific Vector Graphics with TikZ

Apache License 2.0

71 stars 3 forks source link

CUDA out of memory. 4090. train.py #5

Open JasonLLLLLLLLLLL opened 1 year ago

JasonLLLLLLLLLLL commented 1 year ago

when I run examples/train.py with a 4090(24GB). it shows cuda out memeory. May I ask what GPU do you use when run train.py? Thanks!

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 23.64 GiB of which 27.44 MiB is free. Process 171543 has 77.35 MiB memory in use. Including non-PyTorch memory, this process has 22.85 GiB memory in use. Of the allocated memory 22.24 GiB is allocated by PyTorch, and 155.06 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/3900 [00:00<?, ?it/s]

potamides commented 1 year ago

The models were trained on 4x A40 GPUs with 48gb of RAM each, but it should be possible to train them with less hardware, especially the 7b models. You could try playing around with the training hyperparameters here, with the easiest thing you could try being enabling gradient checkpointing.

JasonLLLLLLLLLLL commented 1 year ago

The models were trained on 4x A40 GPUs with 48gb of RAM each, but it should be possible to train them with less hardware, especially the 7b models. You could try playing around with the training hyperparameters here, with the easiest thing you could try being enabling gradient checkpointing.

it is really helpful by enabling gradient checkpointing and set batch_size a small value,etc. Thanks!!!

Could you tell me which file should I run to get tikz-llama-7b? It seems not the examples/train.py. Thanks!

potamides commented 1 year ago

You can either use infer.py for a cli interface or your own instance of the webui (note that you would need to add your own model to the model dict to be able to use it). Check the --help flag for both for usage instructions.