Problem during training

dongyun-kim-arch commented 2 years ago

Hello! I faced the problem when training VToonify. I finished training my own style DualStyleGAN model (Thank you for the author's helps!) and would like to make my own VToonify model.

I was able to do pre-training the encoder, but when training VToonify-D, the error occurred. It seems like the problem is related to my GPU, but my GPU is working, and there was no problem in pre-training the encoder part. Could you have a look at my code and point out what is wrong here?

(vtoonify_env) donghyun@kr-03:~/Desktop/training/VToonify$ python -m torch.distributed.launch --nproc_per_node=1 --master_port=8765 train_vtoonify_d.py --iter 2000 --stylegan_path ./checkpoint/mystyle/generator.pt --exstyle_path ./checkpoint/mystyle/refined_exstyle_code.npy --batch 4 --name vtoonifyd mystyle --fix_color Load options adv_loss: 0.01 batch: 4 direction_path: ./checkpoint/directions.npy encoder_path: ./checkpoint/vtoonifyd mystyle/pretrain.pt exstyle_path: ./checkpoint/mystyle/refined_exstyle_code.npy faceparsing_path: ./checkpoint/faceparsing.pth fix_color: True fix_degree: False fix_style: False grec_loss: 0.1 iter: 2000 local_rank: 0 log_every: 200 lr: 0.0001 msk_loss: 0.0005 name: vtoonifyd mystyle perc_loss: 0.01 pretrain: False save_begin: 30000 save_every: 30000 start_iter: 0 style_degree: 0.5 style_encoder_path: ./checkpoint/encoder.pt style_id: 26 stylegan_path: ./checkpoint/mystyle/generator.pt tmp_loss: 1.0

Setting up Perceptual loss... Loading model from: /home/donghyun/Desktop/training/VToonify/model/stylegan/lpips/weights/v0.1/vgg.pth ...[net-lin [vgg]] initialized ...Done Load models and data successfully loaded! 0%| | 0/2000 [00:00<?, ?it/s] Traceback (most recent call last): File "train_vtoonify_d.py", line 515, in train(args, generator, discriminator, g_optim, d_optim, g_ema, percept, parsingpredictor, down, pspencoder, directions, styles, device) File "train_vtoonify_d.py", line 286, in train fake_pred = discriminator(F.adaptive_avg_pool2d(fake_output, 256), degree_label, style_ind) File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/donghyun/Desktop/training/VToonify/model/vtoonify.py", line 84, in forward condition = torch.cat((self.label_mapper(degree_label), self.style_mapper(style_ind)), dim=1) File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 145, in forward return F.embedding( File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1913, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Input, output and indices must be on the current device

williamyang1991 commented 2 years ago

RuntimeError: Input, output and indices must be on the current device

This is because some of your tensors are in one GPU and some of your tensors are in CPU (or other GPUs). You can print degree_label, style_ind to see where they are just before this code: condition = torch.cat((self.label_mapper(degree_label), self.style_mapper(style_ind)), dim=1)

If some of them are in CPU, you just need to use degree_label.cuda() or style_ind.cuda() If they are on different GPU, you can use non-distributed version

python  train_vtoonify_d.py --iter 2000 --stylegan_path ./checkpoint/mystyle/generator.pt --exstyle_path ./checkpoint/mystyle/refined_exstyle_code.npy --batch 4 --name vtoonify_d_ mystyle --fix_color

dongyun-kim-arch commented 2 years ago

Thank you for your quick reply! The problem is caused by tensor. style_ind was not on GPU as you mentioned, so I am able to run my code, changing the line like this.

condition = torch.cat((self.label_mapper(degree_label), self.style_mapper(style_ind.cuda())), dim=1)

williamyang1991 / VToonify

Problem during training #28