Problem in modifying the code to multi GPU process

hellozgm commented 5 years ago

Hi, thank for your awesome work. I want to modify the code to fit the multi GPU process, and I modify your main code below:

    if args.cuda:
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        model = nn.DataParallel(model)
        model.to(device)

But, I got error:

 Traceback (most recent call last):
  File "core/train.py", line 361, in <module>
    main()
  File "core/train.py", line 353, in main
    train(args, model, optimizer, train_loader, epoch)
  File "core/train.py", line 200, in train
    pred_mattes, pred_alpha = model(input_img)
  File "/home/chaofan/lib/anaconda2/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chaofan/lib/anaconda2/envs/python36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in forward
    return self.gather(outputs, self.output_device)
  File "/home/chaofan/lib/anaconda2/envs/python36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 136, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/chaofan/lib/anaconda2/envs/python36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 67, in gather
    return gather_map(outputs)
  File "/home/chaofan/lib/anaconda2/envs/python36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/chaofan/lib/anaconda2/envs/python36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration

I have no idea about this problem, do you have some suggestions? Thank you!

Neptuneer commented 5 years ago

@hellozgm Same problem! Have you found the solution?

huochaitiantang commented 5 years ago

The multi GPUs is not supported currently. If you want to accelerate the training process, please use the hyperparameter batchSize while training. But the result shows that batchSize=1 can perform better.

huochaitiantang / pytorch-deep-image-matting

Problem in modifying the code to multi GPU process #15