rosinality / stylegan2-pytorch

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch
MIT License
2.7k stars 618 forks source link

lpips Data Parallel error #130

Open seungjunlee96 opened 3 years ago

seungjunlee96 commented 3 years ago

Hi,

I'm trying to use torch.nn.DataParallel on lpips network,but then it gives me error

But when I modified the 100th line in the stylegan2-pytorch.lpips.dist_model.py from self.net = torch.nn.DataParallel(self.net, device_ids=gpu_ids) to self.net = torch.nn.DataParallel(self.net), the error is removed.

is this right solution?

the error code when I used the original code is as below

RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/nas125/SeungjunLee/Projects/stylegan2/lpips/networks_basic.py", line 78, in forward
    res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
  File "/mnt/nas125/SeungjunLee/Projects/stylegan2/lpips/networks_basic.py", line 78, in <listcomp>
    res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 415, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
rosinality commented 3 years ago

Yes, it seems like that just device assignment error. So if your changes make it work then it will be not problematic.