Closed harshbafna closed 4 years ago
Hi,
I just tried using your script to reproduce the issue, but I was unable to reproduce the problem.
Everything worked as expected on my side. The only difference compared to your script was the model checkpoint (I used torchvision pretrained weights) and the image (I used the grace hopper image from https://github.com/pytorch/vision/tree/master/test/assets)
Here is the output of two runs of the script, after changing the device between invocations (gives exactly the same thing)
(segmentation) fmassa@devfair0163:~/github/vision/test$ python multi_device.py
/opt/conda/conda-bld/pytorch_1584602279795/work/torch/csrc/utils/python_arg_parser.cpp:749: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, bool as_tuple)
[{'boxes': tensor([[ 12.4496, 41.6337, 515.4937, 597.8145],
[232.3642, 441.4839, 289.5784, 539.5891],
[223.9480, 414.2102, 293.1623, 487.2314],
[359.8482, 494.2206, 415.2643, 531.1694],
[324.3438, 494.3085, 452.1467, 534.7974],
[ 92.4788, 74.8787, 134.9529, 121.4470],
[ 6.7415, 7.3106, 180.6865, 444.4046],
[ 8.2881, 159.6651, 141.0517, 427.5335],
[368.1360, 492.3096, 413.3743, 506.5931],
[ 59.8016, 7.6531, 292.8199, 359.9379],
[ 1.5268, 51.6872, 40.3637, 91.3684],
[ 28.8533, 126.1654, 259.5476, 426.8148],
[ 2.1129, 139.0286, 75.4144, 207.1743]], device='cuda:0',
grad_fn=<StackBackward>), 'labels': tensor([ 1, 32, 32, 84, 84, 16, 1, 1, 84, 1, 16, 1, 38], device='cuda:0'), 'scores': tensor([0.9994, 0.9377, 0.3584, 0.3254, 0.2454, 0.2307, 0.2070, 0.1550, 0.1476,
0.1409, 0.1122, 0.0855, 0.0637], device='cuda:0',
grad_fn=<IndexBackward>)}]
(segmentation) fmassa@devfair0163:~/github/vision/test$ python multi_device.py
/opt/conda/conda-bld/pytorch_1584602279795/work/torch/csrc/utils/python_arg_parser.cpp:749: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, bool as_tuple)
[{'boxes': tensor([[ 12.4496, 41.6337, 515.4937, 597.8145],
[232.3642, 441.4839, 289.5784, 539.5891],
[223.9480, 414.2102, 293.1623, 487.2314],
[359.8482, 494.2206, 415.2643, 531.1694],
[324.3438, 494.3085, 452.1467, 534.7974],
[ 92.4788, 74.8787, 134.9529, 121.4470],
[ 6.7415, 7.3106, 180.6865, 444.4046],
[ 8.2881, 159.6651, 141.0517, 427.5335],
[368.1360, 492.3096, 413.3743, 506.5931],
[ 59.8016, 7.6531, 292.8199, 359.9379],
[ 1.5268, 51.6872, 40.3637, 91.3684],
[ 28.8533, 126.1654, 259.5476, 426.8148],
[ 2.1129, 139.0286, 75.4144, 207.1743]], device='cuda:1',
grad_fn=<StackBackward>), 'labels': tensor([ 1, 32, 32, 84, 84, 16, 1, 1, 84, 1, 16, 1, 38], device='cuda:1'), 'scores': tensor([0.9994, 0.9377, 0.3584, 0.3254, 0.2454, 0.2307, 0.2070, 0.1550, 0.1476,
0.1409, 0.1122, 0.0855, 0.0637], device='cuda:1',
grad_fn=<IndexBackward>)}]
What's your PyTorch / torchvision versions? I used PyTorch and torchvision from today's nightly.
@chauhang this is pending a reproduction and further information from @harshbafna . I couldn't reproduce it with latest PyTorch / torchvision
@fmassa : I can confirm that this works fine with the nightly build. But can be reproduced with the current stable build (0.5.0), installed using following command :
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
Ok, thanks for confirming that this works fine with the nightly build.
We will be releasing a new version of PyTorch / torchvision in the next following weeks, so the problem will disappear inn the stable builds.
🐛 Bug
TorchVision's pre-trained object detection model like FasterRCNN and MaskRCNN return different output on different cuda device in multi-GPU environment.
To Reproduce
Execute following python script with different cuda device like "cuda:0", "cuda:1" etc.
Expected behavior
These OD models should return similar BB for the detected objects in the input image.
Environment
[pip3] numpy==1.15.4 [conda] blas 1.0 mkl
[conda] torchserve 0.0.1b20200318
[conda] torchtext 0.5.0 py_1 pytorch
[conda] torchvision 0.5.0 py36_cu101 pytorch
[conda] mkl 2020.0 166
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.0.15 py36ha843d7b_0
[conda] mkl_random 1.1.0 py36hd6b4f25_0
[conda] pytorch 1.4.0 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch [conda] torch-model-archiver 0.1.0b20200318
Additional context
These models return similar output when executed on CPU and "cuda:0". But return only single label/tensor/score when executed on any other cuda device on the same machine like "cuda:1"
Output on cuda:0 :
Output on cuda:1 :
Refernce topic on PyTorch forum : https://discuss.pytorch.org/t/pytorch-different-output-on-different-cuda-device-for-fasterrcnn-maskrcnn/71867/3