Closed yutoe05 closed 1 year ago
Hi @yutoe05
- If a GPU works well with device_idx > 0, the GPU will operate with a different non-zero device_idx shifted by using CUDA_VISIBLE_DEVICES.
- If a GPU does not works with device_idx > 0, the GPU will not operate with a different non-zero device_idx shifted by using CUDA_VISIBLE_DEVICES.
This does make it rather sound like their is an issue with your environment than in Dr.Jit. Especially in the examples 1 and 3 where all GPUs are identical.
You could try some other software/library to see if you're getting similar issues. Please report back if you still believe that this is a bug in our implementation.
Thanks for your answer, @njroussel.
I tried cupy, numba and pytorch on multiple GPUs. I check them with some settings/systems where mitsuba3 fails to render, and they work well.
environments:
I run these codes:
import cupy
import numpy as np
device_idx = 1
with cupy.cuda.Device(device_idx):
cx = cupy.arange(20)
cx = cx + 1
print((np.arange(1, 21) == cupy.asnumpy(cx)).any())
breakpoint()
from numba import cuda
import numpy as np
@cuda.jit
def multiplication(input, times, result):
x, y = cuda.grid(2)
result[x, y] = input[x, y]
for i in range(times - 1):
result[x, y] += input[x, y]
device_idx = 1
cuda.select_device(device_idx)
order = 7
input = np.random.randn(2**order, 2**order)
input_d = cuda.to_device(input)
result = np.zeros_like(input)
result_d = cuda.to_device(result)
times = 10
multiplication[(2 ** (order - 4), 2 ** (order - 4)), (2**4, 2**4)](input_d, times, result_d)
gt = input * times
result = result_d.copy_to_host()
print((np.abs((result - gt) / gt) < 1e-3).any())
breakpoint()
import torch
device_idx = 1
torch.cuda.set_device(device_idx)
cx = torch.randn(20).cuda()
cx = cx + 1
breakpoint()
sys.exit()
Using breakpoint(), I checked that the specified device was actually used on nvidia-smi.
Certainly the behavior of mitsuba3 seems to be environmentally dependent, but I don't get similar issues with other libraries, so It may be a bug.
@yutoe05
Thank you for this, it does seem to point to an issue on our end. We don't have any multi-GPU setups so it's a bit hard to debug this futher on our end.
I understand that it's not the most elegant, but is there anything stopping you from always using the CUDA_VISIBLE_DEVICES
workaround ?
Thanks for your response.
I would like to use mitsuba3 when training network of pytorch with data parallel on multiple GPUs.
CUDA_VISIBLE_DEVICES
affects not only mitsuba3 but also pytorch, so I can't use it.
I'll look for any workaround. Thank you very much.
Thank you for the update. Indeed I hadn't thought of that :sweat_smile:. Technically, I think you could just do something like os.environ['CUDA_VISIBLE_DEVICES'] = ...; import drjit as dr; del os.environ['CUDA_VISIBLE_DEVICES']
.
Granted, this is more of a temporary workaround if you're truly stuck.
I might have an idea. Could you try re-ordering your imports and setup as follows:
import drjit as dr
device_idx = 4
dr.set_device(device_idx)
import mitsuba as mi
mi.set_variant("cuda_ad_rgb")
My best guess is that there is some global device memory allocated on the default device (device 0) when importing mitsuba
or drjit
and it is not moved after the set_device
call. If the suggestion above works, then it's something allocated in mitsuba
which sems plausible.
(Your initial example imports and set the variant for mitsuba before setting the device)
Thank you for great ideas!
First, I tried the following:
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = str(device_idx)
import drjit as dr
import mitsuba as mi
del os.environ["CUDA_VISIBLE_DEVICES"]
print("num_gpus:", torch.cuda.device_count())
The result is num_gpus: 1
.
I think CUDA_VISIBLE_DEVICES
is only referenced at the first time when cuda is called, and cuda settings are shared among libraries.
For example,
import torch
print("num_gpus:", torch.cuda.device_count())
os.environ["CUDA_VISIBLE_DEVICES"] = str(device_idx)
import drjit as dr
import mitsuba as mi
del os.environ["CUDA_VISIBLE_DEVICES"]
results in num_gpus > 0 and drjit using the device with device_idx == 0
.
Second, I tried setting device by drjit before importing mitsuba3, but the same error occured.
Third, when adding CUDA_LAUNCH_BLOCKING=1
, I found that
Critical Dr.Jit compiler failure: jit_optix_check(): API error 7050 (OPTIX_ERROR_LAUNCH_FAILURE): "Launch failure" in ../ext/drjit/ext/drjit-core/src/optix_core.cpp:489.
or CUDA_ERROR_INVALID_VALUE
is printed, instead of CUDA_ERROR_ILLEGAL_ADDRESS
or CUDA_ERROR_INVALID_VALUE
.
Therefore, I think this issue may be involved in Optix, which is not used in other libraries I tried.
I also found that some memory is allocated on all visible GPUs when importing drjit. (I'm not sure this information is helpful though.)
I'm sorry, but I'm going to be busy for a few weeks and may not be able to respond immediately.
I apologize for the delayed response.
I still don't know how to use torch and drjit while switching between multiple GPU devices.
I have found, however, that by spliting a single host into multiple nodes in DistributedDataParallel of torch, along with the CUDA_VISIBLE_DEVICES environment variable, I can achieve the initial goal.
Perhaps due to bandwidth constraints, the desired processing speed could not be achieved, but I will close this issue for now.
Thank you very much, @njroussel, for taking the time to help me.
Thanks for this great tool! I would like to use mitsuba3 with pytorch on multiple gpus. But switching device by
drjit.set_device(device_idx)
(device_idx > 0) causes a crash in some settings/systems and shows this error:or
These are examples of setting where a crash occurs.
common information:
example 1
CUDA_ERROR_INVALID_VALUE
example 2
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_INVALID_VALUE
example 3
CUDA_ERROR_INVALID_VALUE
I run this simple code:
Error doesn't occur when I use
CUDA_VISIBLE_DEVICES=device_idx
anddrjit.set_device(0)
.I also found following two cases.
CUDA_VISIBLE_DEVICES
.CUDA_VISIBLE_DEVICES
.So, these errors may be due to my environment.
Do you have any advice? Thanks in advance!