sniklaus / softmax-splatting

an implementation of softmax splatting for differentiable forward warping using PyTorch
468 stars 58 forks source link

How to run the code with second GPU device('cuda:1') #47

Closed ShrisudhanG closed 2 years ago

ShrisudhanG commented 2 years ago

The forward warping functions in softsplat.py produce warped output only when the device id is 'cuda:0'. With other GPUs, the forward warped output is the same as the initialized zero tensor. Is there an approach to perform the forward warp in GPU devices other than the one with device id 'cuda:0'?

sniklaus commented 2 years ago

Should work just fine, @JasonSheng-atp seems to be using it successfully on multiple GPUs: https://github.com/sniklaus/softmax-splatting/issues/46#issuecomment-984303071

ShrisudhanG commented 2 years ago

Thanks for your reply. @JasonSheng-atp uses multiple GPUs for code execution but I want to run solely on a secondary GPU('cuda:1'). Being new to pytorch and cupy, I just want to know if some changes are to be made while moving variables to GPU device(.cuda())

JasonSheng-atp commented 2 years ago

Similar error happens to me a long time ago. I think it may be helpful if you could print related tensors' devices or provide a simple reproduction python script. like print(a.device)

ShrisudhanG commented 2 years ago

Thanks for the reply @JasonSheng-atp. I printed the devices for each tensor as you suggested. I got the following output and error:

Tensor 1 device: cuda:3 Tensor 2 device: cuda:3 Flow device: cuda:3 Metric device: cuda:3 Traceback (most recent call last): File "run.py", line 64, in tenAverage = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='average') File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 362, in FunctionSoftsplat tenOutput = _FunctionSoftsplat.apply(tenInput, tenFlow) File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 267, in forward cupy_launch('kernel_Softsplat_updateOutput', cupy_kernel('kernel_Softsplat_updateOutput', { File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 246, in cupy_launch return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction) File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 373, in compile_with_cache return _compile_with_cache_cuda( File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 484, in _compile_with_cache_cuda mod.load(cubin) File "cupy/cuda/function.pyx", line 222, in cupy.cuda.function.Module.load File "cupy/cuda/function.pyx", line 224, in cupy.cuda.function.Module.load File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.dealloc' Traceback (most recent call last): File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

If you know how to resolve this please let me know...

JasonSheng-atp commented 2 years ago

I am not sure since I dont know cupy, but I guess there may be some tensors generated on cuda:0 somewhere in the softmax splatting process. That's why there is always an illegal memory access error. I suggest that run one net solely on a GPU, and use DDP if you want to implement it on multiple GPUs. Please also check if the flow is infinity.

ShrisudhanG commented 2 years ago

I am running the code solely on cuda:3 only. I guess the softmax splatting function creates some tensor in cuda:0. I will take a look at that. Also, what do you mean by DDP?

JasonSheng-atp commented 2 years ago

from torch.nn.parallel import DistributedDataParallel as DDP use distributed ways to isolate GPUs. For more information, you can check the official docs.

sniklaus commented 2 years ago

Thanks @JasonSheng-atp for chiming in!

I added:

tenOne = tenOne.to(torch.device('cuda:3'))
tenTwo = tenTwo.to(torch.device('cuda:3'))
tenFlow = tenFlow.to(torch.device('cuda:3'))
tenMetric = tenMetric.to(torch.device('cuda:3'))

To this line: https://github.com/sniklaus/softmax-splatting/blob/88892249e6016309b2df358b514f1fdb2bf22b3e/run.py#L53

That worked just fine for me. If it doesn't for you @ShrisudhanG then please provide the output of cupy.show_config().

ShrisudhanG commented 2 years ago

I have already shifted the tensors to the GPU device that is being used currently. I have also printed the device to which each tensor is assigned in the previous message. When I tried to print the output of cupy.show_config(), I got the following error:

Traceback (most recent call last): File "run.py", line 60, in print(cupy.show_config()) File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/init.py", line 866, in show_config _sys.stdout.write(str(_cupyx.get_runtime_info())) File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupyx/_runtime.py", line 225, in str props['name'].decode('utf-8')), AttributeError: 'str' object has no attribute 'decode'

sniklaus commented 2 years ago

I have already shifted the tensors to the GPU device that is being used currently.

What happens if you run the provided run.py with the changes outlined in my previous reply?

I tried to print the output of cupy.show_config()

cupy.show_config() does the printing for you, and it probably returns None hence the error.

ShrisudhanG commented 2 years ago

I have made the changes you suggested. This is the code that I am running now:

#!/usr/bin/env python

import torch

import cv2
import numpy
import cupy

import softsplat

##########################################################

assert(int(str('').join(torch.__version__.split('.')[0:2])) >= 13) # requires at least pytorch version 1.3.0

##########################################################

def read_flo(strFile):
    with open(strFile, 'rb') as objFile:
        strFlow = objFile.read()
    # end

    assert(numpy.frombuffer(buffer=strFlow, dtype=numpy.float32, count=1, offset=0) == 202021.25)

    intWidth = numpy.frombuffer(buffer=strFlow, dtype=numpy.int32, count=1, offset=4)[0]
    intHeight = numpy.frombuffer(buffer=strFlow, dtype=numpy.int32, count=1, offset=8)[0]

    return numpy.frombuffer(buffer=strFlow, dtype=numpy.float32, count=intHeight * intWidth * 2, offset=12).reshape(intHeight, intWidth, 2)
# end

##########################################################

backwarp_tenGrid = {}

def backwarp(tenInput, tenFlow):
    if str(tenFlow.shape) not in backwarp_tenGrid:
        tenHor = torch.linspace(-1.0 + (1.0 / tenFlow.shape[3]), 1.0 - (1.0 / tenFlow.shape[3]), tenFlow.shape[3]).view(1, 1, 1, -1).expand(-1, -1, tenFlow.shape[2], -1)
        tenVer = torch.linspace(-1.0 + (1.0 / tenFlow.shape[2]), 1.0 - (1.0 / tenFlow.shape[2]), tenFlow.shape[2]).view(1, 1, -1, 1).expand(-1, -1, -1, tenFlow.shape[3])

        backwarp_tenGrid[str(tenFlow.shape)] = torch.cat([ tenHor, tenVer ], 1).to(device)#.cuda()
    # end

    tenFlow = torch.cat([ tenFlow[:, 0:1, :, :] / ((tenInput.shape[3] - 1.0) / 2.0), tenFlow[:, 1:2, :, :] / ((tenInput.shape[2] - 1.0) / 2.0) ], 1)

    return torch.nn.functional.grid_sample(input=tenInput, grid=(backwarp_tenGrid[str(tenFlow.shape)] + tenFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros', align_corners=False)
# end

##########################################################

device = torch.device('cuda:3')
tenOne = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./images/one.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).to(device)
tenTwo = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./images/two.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).to(device)
tenFlow = torch.FloatTensor(numpy.ascontiguousarray(read_flo('./images/flow.flo').transpose(2, 0, 1)[None, :, :, :])).to(device)

tenMetric = torch.nn.functional.l1_loss(input=tenOne, target=backwarp(tenInput=tenTwo, tenFlow=tenFlow), reduction='none').mean(1, True)

tenOne = tenOne.to(torch.device('cuda:3'))
tenTwo = tenTwo.to(torch.device('cuda:3'))
tenFlow = tenFlow.to(torch.device('cuda:3'))
tenMetric = tenMetric.to(torch.device('cuda:3'))

print('Tensor 1 device:', tenOne.device)
print('Tensor 2 device:', tenTwo.device)
print('Flow device:', tenFlow.device)
print('Metric device:', tenMetric.device)
cupy.show_config()

intTime = 1
fltTime = 1.0
tenSummation = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='summation')
tenAverage = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='average')
tenLinear = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=(0.3 - tenMetric).clip(0.0000001, 1.0), strType='linear') # finding a good linearly metric is difficult, and it is not invariant to translations
tenSoftmax = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=-20.0 * tenMetric, strType='softmax') # -20.0 is a hyperparameter, called 'alpha' in the paper, that could be learned using a torch.Parameter

print('Forward warp summation:', tenSummation.device)
print('Forward warp average:', tenAverage.device)
print('Forward warp linear:', tenLinear.device)
print('Forward warp softmax:', `tenSoftmax.device)`

And this is the error I get on running this:

Tensor 1 device: cuda:3
Tensor 2 device: cuda:3
Flow device: cuda:3
Metric device: cuda:3
Traceback (most recent call last):
  File "run.py", line 65, in <module>
    cupy.show_config()
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/__init__.py", line 866, in show_config
    _sys.stdout.write(str(_cupyx.get_runtime_info()))
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupyx/_runtime.py", line 225, in __str__
    props['name'].decode('utf-8')),
AttributeError: 'str' object has no attribute 'decode'

Please let me know if I am doing something wrong...

sniklaus commented 2 years ago

What does nvidia-smi return?

ShrisudhanG commented 2 years ago

Output of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   32C    P8     9W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |                  N/A |
| 23%   36C    P8    10W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:83:00.0 Off |                  N/A |
| 23%   33C    P8     9W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN X (Pascal)    Off  | 00000000:84:00.0 Off |                  N/A |
| 23%   30C    P8     9W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
sniklaus commented 2 years ago

You are using different GPUs, the CuPy kernels probably get compiled for one of them and fail when used for the other(s). Try only using devices 1 through 3, for example by using CUDA_VISIBLE_DEVICES="1,2,3" python yourscript.py.

ShrisudhanG commented 2 years ago

Sorry for the delayed response. I tried what you suggested and still got the illegal memory access error.

(lf) prasan@jarvis:/media/data/prasan/shrisudhan/softmax-splatting$ CUDA_VISIBLE_DEVICES="1, 2, 3" python run.py
Tensor 1 device: cuda:1
Tensor 2 device: cuda:1
Flow device: cuda:1
Metric device: cuda:1
Traceback (most recent call last):
  File "run.py", line 70, in <module>
    tenAverage = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='average')
  File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 362, in FunctionSoftsplat
    tenOutput = _FunctionSoftsplat.apply(tenInput, tenFlow)
  File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 267, in forward
    cupy_launch('kernel_Softsplat_updateOutput', cupy_kernel('kernel_Softsplat_updateOutput', {
  File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
  File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 246, in cupy_launch
    return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 373, in compile_with_cache
    return _compile_with_cache_cuda(
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 431, in _compile_with_cache_cuda
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 222, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 224, in cupy.cuda.function.Module.load
  File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Another interesting thing I observed is that, when is use the command you suggested to run the script with cuda:3, I got the following error:

Traceback (most recent call last):
  File "run.py", line 56, in <module>
    tenOne = tenOne.to(device)
RuntimeError: CUDA error: invalid device ordinal

I don't know why pytorch fails to recognize cuda:3 GPU device with here. Without the CUDA_VISIBLE_DEVICES command the device is being recognized.

sniklaus commented 2 years ago

Can you delete the ~/.cupy/ folder and try again?

I don't know why pytorch fails to recognize cuda:3 GPU device with here. Without the CUDA_VISIBLE_DEVICES command the device is being recognized.

That is expected. If you set CUDA_VISIBLE_DEVICES="1,2,3" then you only have access to three CUDA devices which start being indexed at 0 again: cuda:0, cuda:1 and cuda:2.

ShrisudhanG commented 2 years ago

I deleted the ~/.cupy/ folder and ran the code again with CUDA_VISIBLE_DEVICES="1,2,3" and with device='cuda:2'. Still encountering the same Illegal memory access error.

sniklaus commented 2 years ago

I am afraid that I have no idea then. My best guess is that something with the mixed-GPU setup is causing issues since it works just fine in the multi-GPU environments that I have encountered so far (but they all had homogeneous GPU configurations).

ShrisudhanG commented 2 years ago

Okay. Thanks for helping out anyways.

sniklaus commented 2 years ago

Please share your findings if you end up making it work in your environment, thanks!

sniklaus commented 2 years ago

I just updated the repo, maybe you will have more luck with the new version.

ShrisudhanG commented 2 years ago

Hi, sorry for the late response. I will try this and let you know if this works. Thanks!

sniklaus commented 2 years ago

Any updates by chance? Thanks!

sniklaus commented 2 years ago

Closing due to inactivity, seems like this is no longer an issue? Feel free to reopen if it is though.