Error as training on my own dataset, did anyone have this problem before?

twilight0718 commented 2 years ago

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Traceback of forward call that caused the error: File "run.py", line 144, in train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer) File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 66, in train losses_generator, generated = generator_full(x)

Meanwhile there's another problem as well: Traceback (most recent call last): File "run.py", line 144, in train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer) File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 74, in train loss.backward() File "/home/anaconda3/envs/DaGAN/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

It seems an inplace problem happen, but I couldn't find anywhere with an inplace code.

harlanhong commented 2 years ago

Hi, Please use multiple GPUs to train the network. It would happen this problem for some unknow reasons if you train with only one GPU . I cannot solve this problem, maybe it caused by the version of PyTorch.

twilight0718 commented 2 years ago

Thx a lot! The problem have been solved! I truely recommand to note this problem in Readme file.

harlanhong / CVPR2022-DaGAN

Error as training on my own dataset, did anyone have this problem before? #22