williamyang1991 / DualStyleGAN

[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
Other
1.63k stars 253 forks source link

CUDA out of memory #60

Closed bit2map closed 1 year ago

bit2map commented 1 year ago

If I run 'pretrain_dualstylegan.py', an error message 'cuda out of memory' is displayed. I tried changing the --batch to 1 and --path_batch_shrink to 10. The error still occurs.

I am using Tesla v100(VRAM 16GB) x 8 (aws ec2 p3.16xlarge). (A lot of budget has already been consumed... almost $500.)

What else can I try?

bit2map commented 1 year ago

Here is my nvidia-smi log.

ubuntu@:~$ nvidia-smi Sun Jan 29 01:46:48 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 | | N/A 27C P0 40W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:00:18.0 Off | 0 | | N/A 29C P0 41W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:00:19.0 Off | 0 | | N/A 26C P0 40W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:00:1A.0 Off | 0 | | N/A 28C P0 42W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2... On | 00000000:00:1B.0 Off | 0 | | N/A 27C P0 39W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2... On | 00000000:00:1C.0 Off | 0 | | N/A 28C P0 40W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2... On | 00000000:00:1D.0 Off | 0 | | N/A 28C P0 40W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 | | N/A 28C P0 41W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

bit2map commented 1 year ago

Here is error log.

(dualstylegen_env) ubuntu@:~/DualStyleGAN$ python -m torch.distr ibuted.launch --nproc_per_node=8 --master_port=8765 finetune_dualstylegan.py --i ter 1300 --size 1024 --path_batch_shrink 100 --batch 1 --ckpt ./checkpoint/gener ator-pretrain.pt --style_loss 0.25 --CX_loss 0.25 --perc_loss 1 --id_loss 1 --L2 _reg_loss 0.015 --augment gogh Load options CX_loss: 0.25 L2_reg_loss: 0.015 ada_every: 256 ada_length: 500000 ada_target: 0.6 augment: True augment_p: 0 batch: 1 channel_multiplier: 2 ckpt: ./checkpoint/generator-pretrain.pt d_reg_every: 16 encoder_path: ./checkpoint/encoder.pt exstyle_path: ./checkpoint/gogh/exstyle_code.npy g_reg_every: 4 id_loss: 1.0 identity_path: ./checkpoint/model_ir_se50.pth image_path: ./data/gogh/images/train/ instyle_path: ./checkpoint/gogh/instyle_code.npy iter: 1300 lmdb_path: ./data/gogh/lmdb/ local_rank: 0 lr: 0.002 mixing: 0.9 model_name: generator model_path: ./checkpoint/ n_sample: 9 path_batch_shrink: 100 path_regularize: 2 perc_loss: 1.0 r1: 10 save_begin: 1000 save_every: 100 size: 1024 start_iter: 0 style: gogh style_loss: 0.25 subspace_freq: 2 wandb: False


load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading ResNet ArcFace Encoder model successfully loaded! Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Data successfully loaded! 0%| | 0/1 300 [00:00<?, ?it/s]Encoder model successfully loaded! Loading ResNet ArcFace Traceback (most recent call last): File "finetune_dualstylegan.py", line 511, in id_loss = id_loss.IDLoss(args.identity_path).to(device).eval() File "/home/ubuntu/DualStyleGAN/model/encoder/criteria/id_loss.py", line 11, i n init self.facenet.load_state_dict(torch.load(model_paths)) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 595, in load return _legacy_load(opened_file, map_location, pickle_module, pickleload args) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 774, in _legacy_load result = unpickler.load() File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 730, in persistent_load deserialized_objects[root_key] = restore_location(obj, location) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 175, in default_restore_location result = fn(storage, location) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 155, in _cuda_deserialize return storage_type(obj.size()) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/cuda/init.py", line 462, in _lazy_new return super(_CudaBase, cls).new(cls, *args, *kwargs) RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 75.48 MiB already allocated; 20.44 MiB free; 76.00 MiB reserved in total by PyTorch) 0%| | 0/1 300 [00:00<?, ?it/s] Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, insty les, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train real_pred = discriminator(real_img_aug) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], kwargs[0]) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 695, in forward out = self.convs(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 648, in forward skip = self.skip(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 88, in forward out = upfirdn2d(input, self.kernel, pad=self.pad) File "/home/ubuntu/DualStyleGAN/model/stylegan/op/upfirdn2d.py", line 163, in upfirdn2d out = UpFirDn2d.apply(input, kernel, up, down, pad) File "/home/ubuntu/DualStyleGAN/model/stylegan/op/upfirdn2d.py", line 119, in forward out = upfirdn2d_op.upfirdn2d( RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 15.78 GiB total capacity; 4.66 GiB already allocated; 94.44 MiB free; 4.84 GiB reserved i n total by PyTorch) Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Traceback (most recent call last): File "finetune_dualstylegan.py", line 511, in id_loss = id_loss.IDLoss(args.identity_path).to(device).eval() File "/home/ubuntu/DualStyleGAN/model/encoder/criteria/id_loss.py", line 11, i n init self.facenet.load_state_dict(torch.load(model_paths)) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 595, in load return _legacy_load(opened_file, map_location, pickle_module, *pickleload args) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 774, in _legacy_load result = unpickler.load() File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 730, in persistent_load deserialized_objects[root_key] = restore_location(obj, location) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 175, in default_restore_location result = fn(storage, location) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/serialization.py", line 155, in _cuda_deserialize return storage_type(obj.size()) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/cuda/init.py", line 462, in _lazy_new return super(_CudaBase, cls).new(cls, args, **kwargs) RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 55.32 MiB already allocated; 18.44 MiB free; 56.00 MiB reserved in total by PyTorch) Data successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/runpy.py", li ne 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/runpy.py", li ne 87, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/distributed/launch.py", line 260, in main() File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages /torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/dualstyleg en_env/bin/python', '-u', 'finetune_dualstylegan.py', '--local_rank=7', '--iter' , '1300', '--size', '1024', '--path_batch_shrink', '100', '--batch', '1', '--ckp t', './checkpoint/generator-pretrain.pt', '--style_loss', '0.25', '--CX_loss', ' 0.25', '--perc_loss', '1', '--id_loss', '1', '--L2_reg_loss', '0.015', '--augmen t', 'gogh']' returned non-zero exit status 1.

echo1993in commented 1 year ago

All psp encoder model are loaded in GPU:0,you should rewrite some code about psp model .In line 121 ,psp.py ,delete ".to(self.opt.device)". In finetune_dualstylegan.py ,add psp model and psp.latent_avg to cuda .You should change id_loss.py in line 11 ,too. Out of memory ,it's because of psp model and id_loss model both load into GPU:0 as default .

bit2map commented 1 year ago

All psp encoder model are loaded in GPU:0,you should rewrite some code about psp model .In line 121 ,psp.py ,delete ".to(self.opt.device)". In finetune_dualstylegan.py ,add psp model and psp.latent_avg to cuda .You should change id_loss.py in line 11 ,too. Out of memory ,it's because of psp model and id_loss model both load into GPU:0 as default .

Thank you very much for your response. I am trying based on your advice.. Is it correct to modify the code in id_loss.py(line 11) to load the model on cuda? ex) self.facenet.load_state_dict(torch.load(model_paths, map_location='cuda'))

I'm sorry for asking such a beginner question for such a great project.

williamyang1991 commented 1 year ago

I think the idea of @echo1993in is to load all model to CPU, and only in finetune_dualstylegan.py, after multiGPU is ready, they will be loaded into the corresponding GPU.

So you need to use self.facenet.load_state_dict(torch.load(model_paths, map_location='cpu'))

bit2map commented 1 year ago

Thank you for the kind response and explanation. I am modifying two codes.

1) psp.py (121 line) self.latent_avg = ckpt['latent_avg'].to(self.opts.device) to self.latent_avg = ckpt['latent_avg']

2) id_loss.py (11 line) self.facenet.load_state_dict(torch.load(model_paths)) to self.facenet.load_state_dict(torch.load(model_paths, map_location='cpu'))

And a RuntimeError has occurred. RuntimeError: mat1 dim 1 must match mat2 dim 0

---------------- LOG ------------ (dualstylegen_env) ubuntu@:~/DualStyleGAN$ python -m torch.distributed.launch --nproc_per_node=8 --master_port=8765 finetune_dualstylegan.py --iter 1500 --size 1024 --batch 4 --ckpt ./checkpoint/generator-pretrain.pt --style_loss 0.25 --CX_loss 0.25 --perc_loss 1 --id_loss 1 --L2_reg_loss 0.015 --augment gogh Load options CX_loss: 0.25 L2_reg_loss: 0.015 ada_every: 256 ada_length: 500000 ada_target: 0.6 augment: True augment_p: 0 batch: 4 channel_multiplier: 2 ckpt: ./checkpoint/generator-pretrain.pt d_reg_every: 16 encoder_path: ./checkpoint/encoder.pt exstyle_path: ./checkpoint/gogh/exstyle_code.npy g_reg_every: 4 id_loss: 1.0 identity_path: ./checkpoint/model_ir_se50.pth image_path: ./data/gogh/images/train/ instyle_path: ./checkpoint/gogh/instyle_code.npy iter: 1500 lmdb_path: ./data/gogh/lmdb/ local_rank: 0 lr: 0.002 mixing: 0.9 model_name: generator model_path: ./checkpoint/ n_sample: 9 path_batch_shrink: 2 path_regularize: 2 perc_loss: 1.0 r1: 10 save_begin: 1000 save_every: 100 size: 1024 start_iter: 0 style: gogh style_loss: 0.25 subspace_freq: 2 wandb: False


load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt load model: ./checkpoint/generator-pretrain.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading pSp from checkpoint: ./checkpoint/encoder.pt Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Loading ResNet ArcFace Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Encoder model successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! Data successfully loaded! 0%| iter: 0; d: 0.748; g: 0.058; gr: 0.000; sty: 1.179; l2: 130.559; id: 0.750; r1: 0.007; path: 0.677; mean path: 0.008;iter: 0; d: 0.748; g: 0.058; gr: 0.000; sty: 1.179; l2: 130.559; id: 0.750; r1: 0.007; path: 0.677; mean path: 0.008; augment: 0.0000;: 0%| | 1/1500 [00:04<1:47:00, 4.28s/it]Traceback (most recent call last): Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in File "finetune_dualstylegan.py", line 539, in Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train

File "finetune_dualstylegan.py", line 200, in train train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train real_pred = discriminator(real_img_aug)real_pred = discriminator(real_img_aug)

File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl real_pred = discriminator(real_img_aug) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl real_pred = discriminator(real_img_aug) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl real_pred = discriminator(real_img_aug) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward result = self.forward(*input, *kwargs)result = self.forward(input, kwargs)

result = self.forward(*input, **kwargs)      File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward

File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward

result = self.forward(*input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], *kwargs[0]) output = self.module(inputs[0], kwargs[0]) output = self.module(*inputs[0], **kwargs[0])

File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl output = self.module(*inputs[0], kwargs[0]) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl output = self.module(*inputs[0], *kwargs[0]) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs)result = self.forward(*input, *kwargs)result = self.forward(input, **kwargs)

File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward result = self.forward(*input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward result = self.forward(*input, *kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward out = self.final_linear(out) out = self.final_linear(out) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl out = self.final_linear(out) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl out = self.final_linear(out) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl out = self.final_linear(out) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) result = self.forward(*input, *kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward result = self.forward(input, **kwargs)

File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward result = self.forward(*input, *kwargs) result = self.forward(input, **kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl input = module(input)input = module(input)

  File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl

File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward result = self.forward(*input, *kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward result = self.forward(input, kwargs)result = self.forward(*input, **kwargs)

result = self.forward(*input, **kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward

out = F.linear(input, self.weight * self.scale) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward

out = F.linear(input, self.weight * self.scale)  File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear

File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear out = F.linear(input, self.weight self.scale)out = F.linear(input, self.weight self.scale)

  File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear

out = F.linear(input, self.weight * self.scale) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear

File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: mat1 dim 1 must match mat2 dim 0output = input.matmul(weight.t())

output = input.matmul(weight.t())output = input.matmul(weight.t()) RuntimeError : mat1 dim 1 must match mat2 dim 0output = input.matmul(weight.t())RuntimeError RuntimeError : : mat1 dim 1 must match mat2 dim 0mat1 dim 1 must match mat2 dim 0

RuntimeError: mat1 dim 1 must match mat2 dim 0 Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train real_pred = discriminator(real_img_aug) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward out = self.final_linear(out) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward out = F.linear(input, self.weight self.scale) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: mat1 dim 1 must match mat2 dim 0 Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train real_pred = discriminator(real_img_aug) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], kwargs[0]) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward out = self.final_linear(out) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward out = F.linear(input, self.weight self.scale) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: mat1 dim 1 must match mat2 dim 0 iter: 0; d: 0.748; g: 0.058; gr: 0.000; sty: 1.179; l2: 130.559; id: 0.750; r1: 0.007; path: 0.677; mean path: 0.008; augment: 0.0000;: 0%| | 1/1500 [00:05<2:08:24, 5.14s/it] Traceback (most recent call last): File "finetune_dualstylegan.py", line 539, in train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, instyles, Simgs, exstyles, vggloss, id_loss, device) File "finetune_dualstylegan.py", line 200, in train real_pred = discriminator(real_img_aug) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 710, in forward out = self.final_linear(out) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, *kwargs) File "/home/ubuntu/DualStyleGAN/model/stylegan/model.py", line 152, in forward out = F.linear(input, self.weight self.scale) File "/home/ubuntu/anaconda3/envs/dualstylegen_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: mat1 dim 1 must match mat2 dim 0


I think the error is occurring in the input.shape of the Discriminator. I have tried several things, but the results are not good. What part do I need to modify?

bit2map commented 1 year ago

In finetune_dualstylegan.py (line 508,509,511)

encoder = pSp(opts).to(device).eval() encoder.latent_avg = encoder.latent_avg.to(device) id_loss = id_loss.IDLoss(args.identity_path).to(device).eval()

Do these three lines of code need to be moved or modified?

williamyang1991 commented 1 year ago

In finetune_dualstylegan.py (line 508,509,511)

encoder = pSp(opts).to(device).eval() encoder.latent_avg = encoder.latent_avg.to(device) id_loss = id_loss.IDLoss(args.identity_path).to(device).eval()

Do these three lines of code need to be moved or modified?

No

williamyang1991 commented 1 year ago

I think the error is occurring in the input.shape of the Discriminator. I have tried several things, but the results are not good. What part do I need to modify?

The modified code is just to load the model first into CPU and then into GPU. It don't change any process. So there should be no errors like your issue. I think your error is not because first loading the model into CPU. I cannot debug for you. You can print the tensor size everywhere before your error occurs to find out the reason.

bit2map commented 1 year ago

Understood. I will try based on the answers you provided. What is the recommended VRAM for the graphics card to run finetune_dualstylegan.py? Is 24GB x 8 enough to run it? As a last resort, I plan to try by increasing the VRAM size.

williamyang1991 commented 1 year ago

I use 32GB*8 to train the model.