Open boduan1 opened 2 years ago
Hi @boduan1,
That might be related to the pytorch version you're using. It might be an issue with updates to the torch.distributed package. Are you using pytorch 1.9.0 like we mention in the readme?
I find the same question during training full pipeline with multi-gpus. You are right the reason is pytorch version. However, I am unable to install pytorch1.9.0 because I can not degrade the cuda version (11.6) in the clusting environment. Could you please give me some suggestion about running full pipeline with cuda 11.6?
Hi @wosecz,
That issue lies with pytorch's distributed training package. My suggestion would be to re-implement that part in the training code such that it will work on the latest pytorch version.
Thank you for your reply! I check the training part and find out the problem comes from the "autograd.grad" function in "g_path_regularize", losses.py. The full pipeline training code runs successfully with g_reg_every=0 in pytorch 1.9.0 env. However, omitting G regularization leads to performance degradation. I'm still trying to fix the version problem of autograd functions. Thank you again for your help!
Thanks for the interesting work ! I also met the same problem during training, I solved it by translating the "latent" variable to a leaf node, which can get the gradient. You just need to add
if return_latents:
latent = latent.detach()
latent.requires_grad_(True)
in the forward
of the class Decoder
in model.py
(Line 618)
before you synthesising the fake img.
Then it works.
Thanks for the interesting work ! I also met the same problem during training, I solved it by translating the "latent" variable to a leaf node, which can get the gradient. You just need to add
if return_latents: latent = latent.detach() latent.requires_grad_(True)
in the
forward
of the classDecoder
inmodel.py
(Line 618) before you synthesising the fake img. Then it works.
It works for me 👍🏻
Hello royorel! First thanks for your previous suggestion with the volume rendering part, it works for me now.
But I then got a problem with the full pipeline part, when I use 1 GPU everything works fine, but when I change to 2 or 4 gpus, it gives an error in the beginning
Do you know what might be the problem? (there is no problem when I use 2 or 4 GPUs for the volume rendering part)
Moreover, could you also give some instructions about how to repeat the evaluation in the paper? Thanks!