Closed MetiCodePrivateLimited closed 3 years ago
You ran out of memory on your GPU. You should reduce your batch size until you do not get the error. Or you can use a GPU with more memory.
What is the recommended GPU card for training model?
We use a P40 GPU with 22GB RAM. If you want to recreate the paper results you'll need at least 20GB RAM. If you reduce the batch size you could go with less
I tried with reducing batch_size in the command but still got Cuda out of memory error. Below is the command for your review
python3 scripts/train.py \ --dataset_type=ffhq_aging \ --exp_dir=/path/to/experiment \ --workers=2 \ --batch_size=2 \ --test_batch_size=2 \ --test_workers=2 \ --val_interval=2500 \ --save_interval=10000 \ --start_from_encoded_w_plus \ --id_lambda=0.1 \ --lpips_lambda=0.1 \ --lpips_lambda_aging=0.1 \ --lpips_lambda_crop=0.6 \ --l2_lambda=0.25 \ --l2_lambda_aging=0.25 \ --l2_lambda_crop=1 \ --w_norm_lambda=0.005 \ --aging_lambda=2 \ --cycle_lambda=1 \ --input_nc=4 \ --max_steps=5000 \ --output_size=1024 \ --target_age=uniform_random \ --use_weighted_id_loss \ --checkpoint_path=/usr/local/SAM/trained/sam_ffhq_aging.pt
Currently, I have 8GB GPU 1080. Do I need to replace my GPU? I am planning to replace with GPU RTX3090 24GB. What is your opinion on it?
A single GPU with 8GB RAM is probably not enough for getting a meaningful batch size. We used a GPU with 22GB which allowed us to train with a batch size of 8.
ok
I am getting following error if I run train.py.
RuntimeError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 7.93 GiB total capacity; 6.86 GiB already allocated; 18.44 MiB free; 7.31 GiB reserved in total by PyTorch)
Also, if I run following commands the I get output: import torch print(torch.rand(1, device="cuda"))
output: tensor([0.4547], device='cuda:0')
Could you please help me to fix it?