Cuda out of memory - Githubissues

yuval-alaluf / SAM

Official Implementation for "Only a Matter of Style: Age Transformation Using a Style-Based Regression Model" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02754

https://yuval-alaluf.github.io/SAM/

MIT License

632 stars 151 forks source link

Cuda out of memory #23

Closed MetiCodePrivateLimited closed 3 years ago

MetiCodePrivateLimited commented 3 years ago

I am getting following error if I run train.py.

RuntimeError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 7.93 GiB total capacity; 6.86 GiB already allocated; 18.44 MiB free; 7.31 GiB reserved in total by PyTorch)

Also, if I run following commands the I get output: import torch print(torch.rand(1, device="cuda"))

output: tensor([0.4547], device='cuda:0')

Could you please help me to fix it?

yuval-alaluf commented 3 years ago

You ran out of memory on your GPU. You should reduce your batch size until you do not get the error. Or you can use a GPU with more memory.

MetiCodePrivateLimited commented 3 years ago

What is the recommended GPU card for training model?

yuval-alaluf commented 3 years ago

We use a P40 GPU with 22GB RAM. If you want to recreate the paper results you'll need at least 20GB RAM. If you reduce the batch size you could go with less

MetiCodePrivateLimited commented 3 years ago

I tried with reducing batch_size in the command but still got Cuda out of memory error. Below is the command for your review

python3 scripts/train.py \ --dataset_type=ffhq_aging \ --exp_dir=/path/to/experiment \ --workers=2 \ --batch_size=2 \ --test_batch_size=2 \ --test_workers=2 \ --val_interval=2500 \ --save_interval=10000 \ --start_from_encoded_w_plus \ --id_lambda=0.1 \ --lpips_lambda=0.1 \ --lpips_lambda_aging=0.1 \ --lpips_lambda_crop=0.6 \ --l2_lambda=0.25 \ --l2_lambda_aging=0.25 \ --l2_lambda_crop=1 \ --w_norm_lambda=0.005 \ --aging_lambda=2 \ --cycle_lambda=1 \ --input_nc=4 \ --max_steps=5000 \ --output_size=1024 \ --target_age=uniform_random \ --use_weighted_id_loss \ --checkpoint_path=/usr/local/SAM/trained/sam_ffhq_aging.pt

MetiCodePrivateLimited commented 3 years ago

Currently, I have 8GB GPU 1080. Do I need to replace my GPU? I am planning to replace with GPU RTX3090 24GB. What is your opinion on it?

yuval-alaluf commented 3 years ago

A single GPU with 8GB RAM is probably not enough for getting a meaningful batch size. We used a GPU with 22GB which allowed us to train with a batch size of 8.

MetiCodePrivateLimited commented 3 years ago