ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.1k stars 5.6k forks source link

[core][init] RuntimeError: No CUDA GPUs are available. ray can not use GPU in windows on cuda >=118 and wsl. #33798

Closed xiezhipeng-git closed 9 months ago

xiezhipeng-git commented 1 year ago

What happened + What you expected to happen

set gpu num to ray.but ray connot use gpu ray.get_gpu_ids() will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process. if use tensor,to("cuda") it is raise RuntimeError: No CUDA GPUs are available error File "/home/xzpwsl2/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 247, in _lazy_init torch._C._cuda_init() RuntimeError: No CUDA GPUs are available

Versions / Dependencies

ray 2.3.1 torch2.0 windows and windowswsl

Reproduction script

import ray
import time
from logger import logger
from multiprocessing import cpu_count
import os
import torch

device = torch.device('cuda' if torch.cuda.
                      is_available() else 'cpu')
localGpu_num = torch.cuda.device_count()
localGpu_str = str(list(range(localGpu_num))).strip('[]')
os.environ['CUDA_VISIBLE_DEVICES']=localGpu_str
print("CUDA_VISIBLE_DEVICES",os.getenv("CUDA_VISIBLE_DEVICES"))
ray.init(num_cpus=cpu_count(), num_gpus=1)
print('ray.init()',ray.get_gpu_ids())

@ray.remote
def f(i):
    # print("@ray.remote","f",i)
    torch.rand(1, 1).to(device)
    time.sleep(1)
    # print("@ray.remote","f",2,i)
    return i*i
bt = time.time()
futures = [f.remote(i) for i in range(100)]
logger.info(ray.get(futures))
et = time.time()
logger.info(f"用时{et-bt}")

#It will see
#CUDA_VISIBLE_DEVICES 0
#ray.init() []  then error

Issue Severity

High: It blocks me from completing my task.

cadedaniel commented 1 year ago

Does this work without Ray? how does PyTorch find GPUs in your WSL setup when ray isn't involved?

rkooo567 commented 1 year ago

Yeah it is unrelated to ray I think. Your driver probably has no visibility to GPU (you may need to set env var CUDA_VISIBLE_DEVICES yourself)

xiezhipeng-git commented 1 year ago

@rkooo567

Yeah it is unrelated to ray I think. Your driver probably has no visibility to GPU (you may need to set env var CUDA_VISIBLE_DEVICES yourself)

I already set env var look my code os.environ['CUDA_VISIBLE_DEVICES']=localGpu_str and print("CUDA_VISIBLE_DEVICES",os.getenv("CUDA_VISIBLE_DEVICES")) result CUDA_VISIBLE_DEVICES 0 prove my env var is success and is use cuda0

xiezhipeng-git commented 1 year ago

Does this work without Ray? how does PyTorch find GPUs in your WSL setup when ray isn't involved?

like os.getenv("CUDA_VISIBLE_DEVICES")) ?

rkooo567 commented 1 year ago

To be clear,

ray.get_gpu_ids() is not the API for the driver. It is expected to return nothing if you run this outside ray task/actor (inside remote func / class).

Also, the task/actor runs in a different process. So, it is not ideal you declare a device variable outside the task definition. The driver (python script) runs in a different process from a task (f in your script).

# Move this inside to the task f
@ray.remote
def f():
    device = torch.device('cuda' if torch.cuda.
                      is_available() else 'cpu')

Lastly, if you specify num_gpus to ray task or actor, it will automatically set CUDA_VISIBLE_DEVICES env var for you.

@ray.remote(num_gpus=1)
def f(i):
    # This task should have `CUDA_VISIBLE_DEVICES` env var set
xiezhipeng-git commented 1 year ago

需要明确的是,

ray.get_gpu_ids()不是驱动程序的 API。如果您运行这个外部 ray 任务/actor(在远程函数/类内部),预计不会返回任何内容。

此外,任务/参与者在不同的进程中运行。因此,在任务定义之外声明变量并不理想。驱动程序(python脚本)在与任务(在脚本中)不同的进程中运行。device``f

# Move this inside to the task f
@ray.remote
def f():
    device = torch.device('cuda' if torch.cuda.
                      is_available() else 'cpu')

最后,如果您指定 ray 任务或演员,它将自动为您设置 env var。num_gpus``CUDA_VISIBLE_DEVICES

@ray.remote(num_gpus=1)
def f(i):
    # This task should have `CUDA_VISIBLE_DEVICES` env var set

thanks,I see device need write into rayfunction. and if i try @ray.remote(num_cpus = cpu_count(),num_gpus=1) (raylet) /home/xzpwsl2/.local/lib/python3.10/site-packages/ray/dashboard/agent.py:51: DeprecationWarning: There is no current event loop (raylet) aiogrpc.init_grpc_aio() (autoscaler +28s) Tip: use ray status to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0. (autoscaler +28s) Warning: The following resource request cannot be scheduled right now: {'CPU': 32.0, 'GPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster. (autoscaler +1m38s) Warning: The following resource request cannot be scheduled right now: {'CPU': 32.0, 'GPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster. [2023-03-29 14:10:49,629] [INFO] [pid=3124][][/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py:44] ([0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 9 ... [2023-03-29 14:10:49,632] [INFO] [pid=3124][][/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py:46] ('用时259.88020491600037',)

and run program. it is need 52 times time cost use these code than @ray.remote ('用时5.508474111557007',) only use cpu @ray.remote(num_cpus = cpu_count(),num_gpus=1) def f(i):

print("@ray.remote","f",i)

device = torch.device('cuda' if torch.cuda.
                  is_available() else 'cpu')
torch.rand(1, 1).to(device)
time.sleep(1)
# print("@ray.remote","f",2,i)
return i*i

if @ray.remote(num_cpus = cpu_count(),num_gpus=1/cpu_count()) ('用时275.5370864868164',) and if write in ray function device.use @ray.remote it only use cpu is normal?

xiezhipeng-git commented 1 year ago

if use @ray.remote(num_cpus = 1,num_gpus=1/cpu_count()) it raise error ray::f() (pid=31112, ip=172.28.246.219) File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 39, in f torch.rand(1, 1).to(device) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 45, in print(ray.get(futures)) ray.exceptions.RayTaskError(RuntimeError): ray::f() (pid=31112, ip=172.28.246.219) File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 39, in f torch.rand(1, 1).to(device) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. and waring /home/xzpwsl2/.local/lib/python3.10/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML (f pid=14629) warnings.warn("Can't initialize NVML") but I already install cuda toolkit 118 and 120.pytorch only use 118

xiezhipeng-git commented 1 year ago

@rkooo567 @cadedaniel how can i use gpu with ray?now is ray bugs or my usage method is incorrect?

cadedaniel commented 1 year ago

Yes -- but the get_gpu_ids function doesn't work from drivers (only in Ray tasks/actors). Need to find some other function or move it into a task/actor.

https://github.com/ray-project/ray/blob/ef31bc6aa80fa0ce5b08594fb9d1b7d0a117e5cc/python/ray/_private/worker.py#L900-L905

xiezhipeng-git commented 1 year ago

ray.get_gpu_ids()

@cadedaniel now I see ray.get_gpu_ids() =[]'s reason. then how can i use gpu in ray? I try four kinds ray init function.now if I set gpunum. program error or very slow.

cadedaniel commented 1 year ago
import ray
@ray.remote(num_gpus=1)
def f():
    import os
    print(os.environ.get("CUDA_VISIBLE_DEVICES"))
ray.get(f.remote())

What does this print?

xiezhipeng-git commented 1 year ago
import ray
@ray.remote(num_gpus=1)
def f():
    import os
    print(os.environ.get("CUDA_VISIBLE_DEVICES"))
ray.get(f.remote())

What does this print?

0 if set num_gpus .it can use gpu.but very slow. use time 315.9938106536865s if i only use cpu。 it is 5s

hora-anyscale commented 1 year ago

@rkooo567 - please assign a priority and remove the triage label.

jjyao commented 1 year ago

@xiezhipeng-git what's the time you are measuring?

xiezhipeng-git commented 1 year ago

@xiezhipeng-git what's the time you are measuring?

@ray.remote def f(i):

print("@ray.remote","f",i)

torch.rand(1, 1).to(device)
time.sleep(1)
# print("@ray.remote","f",2,i)
return i*i

bt = time.time() futures = [f.remote(i) for i in range(100)] logger.info(ray.get(futures)) et = time.time() print(f"用时{et-bt}") Adding wait time in computation can effectively simulate the parallel capabilities of GPUs and CPUs

xiezhipeng-git commented 1 year ago

Can we prioritize solving this problem. Because of this issue, my single machine training speed is faster than your large cluster training speed. I can train a 24Gb GPU to solve atari pong in 60 seconds. When using ray, multiple processes will report errors or slow down. This is a very serious problem. Ray is not as good as a single machine at all before gpu support.

rkooo567 commented 1 year ago

I don't think this is a bug. Can you read https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html. There are 2 things you should know here.

  1. You should specify num_gpus to ray.remote. Then "inside the actor/task", we automatically set the CUDA_VISIBLE_DEVICES.
  2. get_gpu_ids doesn't work inside a driver (the python script). It is only available from workers.
import ray
import time
from logger import logger
from multiprocessing import cpu_count
import os
import torch

# GPUs and CPUs are auto-detected by ray when you call ray.init()
ray.init()
# -----> this doesn't work because get_gpu_ids cannot be used in a driver.
# print('ray.init()',ray.get_gpu_ids())

@ray.remote(num_gpus=1)
def f(i):
    print(ray.get_gpu_ids()))
    # Here, CUDA_VISIBLE_DEVICES should be correctly set
     print(os.getenv("CUDA_VISIBLE_DEVICES"))
    device = torch.device('cuda' if torch.cuda.
                      is_available() else 'cpu')
    torch.rand(1, 1).to(device)
    time.sleep(1)
    # print("@ray.remote","f",2,i)
    return i*i
bt = time.time()
futures = [f.remote(i) for i in range(100)]
logger.info(ray.get(futures))
et = time.time()
logger.info(f"用时{et-bt}")
xiezhipeng-git commented 1 year ago

I don't think this is a bug. Can you read https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html. There are 2 things you should know here.

  1. You should specify num_gpus to ray.remote. Then "inside the actor/task", we automatically set the CUDA_VISIBLE_DEVICES.
  2. get_gpu_ids doesn't work inside a driver (the python script). It is only available from workers.
import ray
import time
from logger import logger
from multiprocessing import cpu_count
import os
import torch

# GPUs and CPUs are auto-detected by ray when you call ray.init()
ray.init()
# -----> this doesn't work because get_gpu_ids cannot be used in a driver.
# print('ray.init()',ray.get_gpu_ids())

@ray.remote(num_gpus=1)
def f(i):
    print(ray.get_gpu_ids()))
    # Here, CUDA_VISIBLE_DEVICES should be correctly set
     print(os.getenv("CUDA_VISIBLE_DEVICES"))
    device = torch.device('cuda' if torch.cuda.
                      is_available() else 'cpu')
    torch.rand(1, 1).to(device)
    time.sleep(1)
    # print("@ray.remote","f",2,i)
    return i*i
bt = time.time()
futures = [f.remote(i) for i in range(100)]
logger.info(ray.get(futures))
et = time.time()
logger.info(f"用时{et-bt}")

Look at historical information. I have tried multiple GPU configurations. Including gpu=1. The result only slows down and reports an error. Isn't this result a bug?

futures = [f.remote(i) for i in range(100)] is must change to futures = [f.remote(i) for i in range(cpus)] if @ray.remote(num_gpus=1/cpu) ?
It is the user’s responsibility to make sure that the individual tasks don’t use more than their share of the GPU memory. TensorFlow can be configured to limit its memory usage.

xiezhipeng-git commented 1 year ago

If I try

@ray.remote(num_cpus = 1,num_gpus=1/cpus)
futures = [f.remote(i) for i in range(cpus)]

It 用时4.756626605987549

This indicates that the actual problem is that ray will not automatically release the GPU memory and continue running. And the CPU will automatically process it or use meomory very less to not have this problem. I have finally found the specific reason and solution. Thanks. This speed was achieved by reducing the computational load by 68%. But this basically meets expectations. Because the GPU clock frequency is 2.6 and the CPU is 5.8 on my machine(may be not true.because time.sleep(1)) And it has other bug if itry

@ray.remote(num_cpus = 1,num_gpus=1/100)
futures = [f.remote(i) for i in range(100)]

 File "python\ray\_raylet.pyx", line 807, in ray._raylet.execute_task 
  File "python\ray\_raylet.pyx", line 841, in ray._raylet.execute_task 
  File "python\ray\_raylet.pyx", line 528, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: Unable to allocate internal buffer.
traceback: Traceback (most recent call last):
  File "D:\my\env\python3.10.10\lib\site-packages\ray\_private\serialization.py", line 369, in deserialize_objects
    obj = self._deserialize_object(data, metadata, object_ref)
  File "D:\my\env\python3.10.10\lib\site-packages\ray\_private\serialization.py", line 252, in _deserialize_object
    return self._deserialize_msgpack_data(data, metadata_fields)       
  File "D:\my\env\python3.10.10\lib\site-packages\ray\_private\serialization.py", line 204, in _deserialize_msgpack_data
    msgpack_data, pickle5_data = split_buffer(data)
  File "python\ray\includes/serialization.pxi", line 206, in ray._raylet.split_buffer
  File "msgpack\_unpacker.pyx", line 372, in msgpack._cmsgpack.Unpacker.__init__
MemoryError: Unable to allocate internal buffer.

why can use 1/32 not can use 1/100. this program use gpu memory almost zero.Is it because it will be limited by the number of CPUs CPU=1? After testing, the minimum allowed number of GPU members cannot be less than num_ gpus=1/cpus

Is it because it will be limited by the number of CPUs CPU=1 If only CPU is used These problems would not exist if only the CPU was used

rkooo567 commented 1 year ago

Hmm probably windows specific issue. I couldn't reproduce it in linux. cc @mattip to follow up

mattip commented 1 year ago

Just to be clear: this code raises a MemoryError using ray 2.3.1 with python3.10 on windows10, but if you replace the 1/100 with 1/32 it works?

import ray
import time
from logger import logger
from multiprocessing import cpu_count
import os
import torch

# GPUs and CPUs are auto-detected by ray when you call ray.init()
ray.init()
# -----> this doesn't work because get_gpu_ids cannot be used in a driver.
# print('ray.init()',ray.get_gpu_ids())

@ray.remote(num_cpus = 1,num_gpus=1/100)
def f(i):
    print(ray.get_gpu_ids()))
    # Here, CUDA_VISIBLE_DEVICES should be correctly set
    print(os.getenv("CUDA_VISIBLE_DEVICES"))
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    torch.rand(1, 1).to(device)
    time.sleep(1)
    # print("@ray.remote","f",2,i)
    return i*i
bt = time.time()
futures = [f.remote(i) for i in range(100)]
logger.info(ray.get(futures))
et = time.time()
logger.info(f"time {et-bt}")

May I ask for more information: What GPU and cuda version you are using? What version of pytorch? How much system memory do you have?

xiezhipeng-git commented 1 year ago

if use ,num_gpus=1/100 and range(100) theprogram and vscode crash then vscode restart and stop in RayTaskError. if num_gpus=1/cpus and range(cpus ) This speed was achieved by reducing the computational load by 68%. time use 4.756 speed is slow between only cpu range(100) 5s NVIDIA GeForce RTX 4090 CUDA Version: 12.1 24GB torch 2.0.0+cu118 system memory 64GB Windows 11 python 3.10.10 And wsl python 3.10.6 has same problem ubuntu 22.04 And python3.9.13 cu118 same NVIDIA GeForce RTX 1070 win10

发生异常: RayTaskError(RuntimeError) ray::f() (pid=9443, ip=172.28.246.219) File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 47, in f torch.rand(1, 1).to(device) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 53, in print(ray.get(futures)) ray.exceptions.RayTaskError(RuntimeError): ray::f() (pid=9443, ip=172.28.246.219) File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 47, in f torch.rand(1, 1).to(device) RuntimeError: CUDA error: out of memory

xiezhipeng-git commented 1 year ago

After upgrading ray to 2.5.0, this issue has become even more serious.1/32 also do not work. Running the above program using gpu will directly cause the computer Blue screen of death to crash.

This is just the simplest example of ray using GPU. Can we prioritize solving it. As long as it is Windows, the ray GPU function cannot be used. The priority for this should be very high, right?

richardliaw commented 1 year ago

@mattip are you able to reproduce this?

@xiezhipeng-git can you post a reproducible script for us to try?

xiezhipeng-git commented 1 year ago

你能重现这个吗?

你能发布一个可复制的脚本供我们尝试吗?

Please take a look at the historical news above. There is a replication script inside. Probability of Blue screen of death. 100% RuntimeError: CUDA error: out of memory


import ray
import time
# from logger import logger
from multiprocessing import cpu_count
import os
import torch

# device = torch.device('cuda' if torch.cuda.
#                       is_available() else 'cpu')
localGpu_num = torch.cuda.device_count()
localGpu_str = str(list(range(localGpu_num))).strip('[]')
os.environ['CUDA_VISIBLE_DEVICES']=localGpu_str
print("CUDA_VISIBLE_DEVICES",os.getenv("CUDA_VISIBLE_DEVICES"))
ray.init(num_cpus=cpu_count(), num_gpus=1)

# need=1.0/cpu_count()
# @ray.remote(num_gpus=need)
# @ray.remote
@ray.remote(num_cpus = 1,num_gpus=1/24)
# @ray.remote(num_gpus=1)
# @ray.remote(num_gpus=1/cpu_count())
def f(i):
     device = torch.device('cuda' if torch.cuda.
                      is_available() else 'cpu')
    if i==1 or i==2:
      print("@ray.remote里:",device,i,torch.rand(1, 1).to(device))
    torch.rand(1, 1).to(device)
    time.sleep(1)
    return i*i
bt = time.time()
futures = [f.remote(i) for i in range(100)]
print(ray.get(futures))
et = time.time()
print(f"用时{et-bt}")
mattip commented 1 year ago

The priority for this should be very high, right?

Yes, the priority should be high, but only if we can reproduce it. Many other people are using ray on windows and not seeing this error. Perhaps you have a hardware or OS problem: ray + torch should not cause a BSOD error even if the task does not complete properly.

I cannot reproduce this. Here is what I did (where test_ray.py is the script just above, with the syntax errors fixed and changed to use only ascii). Note, as stated above, that CUDA_VISIBLE_DEVICES 0 is expected outside of a ray.remote call. The loop runs for the expected 100 iterations from 0 to 99:

>python310.exe -mvenv \temp\ray_throwaway
>\temp\ray_throwaway\Scripts\activate
(ray_throwaway) python -m pip install torch --index-url https://download.pytorch.org/whl/cu117
(ray_throwaway) python -m pip install ray==2.5.1
(ray_throwaway) python test.py
(ray_throwaway) d:\pypy_stuff>python \temp\test_ray.py
  File "d:\temp\test_ray.py", line 25
    if i==1 or i==2:
                    ^
IndentationError: unindent does not match any outer indentation level

(ray_throwaway) d:\pypy_stuff>python \temp\test_ray.py
CUDA_VISIBLE_DEVICES 0
2023-06-29 05:38:01,149 INFO worker.py:1636 -- Started a local Ray instance.
(f pid=15708) @ray.remote: cuda 2
(f pid=15708) tensor([[0.8174]], device='cuda:0')
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801]
time 27.490281343460083
(f pid=15672) @ray.remote: cuda 1
(f pid=15672) tensor([[0.9247]], device='cuda:0')

@xiezhipeng-git you need to give us more information in order to help you. Please carefully answer all of the following:

xiezhipeng-git commented 1 year ago

I try these code on some machine and some python version in windows system.There is same error. If I try use gpu. in these

@ray.remote(num_gpus=need)
@ray.remote
@ray.remote(num_cpus = 1,num_gpus=1/24)
@ray.remote(num_gpus=1)
@ray.remote(num_gpus=1/cpu_count())

only cpu can run success it task 5s

  • what is the complete output of nvidia-smi?
    
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 530.47                 Driver Version: 531.68       CUDA Version: 12.1     |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                                         |                      |               MIG M. |
    |=========================================+======================+======================|
    |   0  NVIDIA GeForce RTX 4090         On | 00000000:01:00.0  On |                  Off |
    |  0%   35C    P5               27W / 490W|   2040MiB / 24564MiB |      4%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 22 G /Xwayland N/A | +---------------------------------------------------------------------------------------+


* what operating system are you using?
win11 and wind11 wsl2 ubuntu-22.04 and same error on windows 10  and wind 10 wsl2 ubuntu-20.04
* what version of python are you using?
win10 3.9.13 win10_wsl2 3.9.13   and  win11_wsl2  3.10.6 and  win11 3.10.10
* what does `pip list` output?

Package Version


absl-py 1.4.0 aiosignal 1.3.1 albumentations 1.3.0 ale-py 0.8.1 anyio 3.6.2 appdirs 1.4.4 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 arrow 1.2.3 astroid 2.15.0 asttokens 2.2.1 asyncio 3.4.3 attrs 22.2.0 backcall 0.2.0 beautifulsoup4 4.12.2 bitmath 1.3.3.1 bleach 6.0.0 blinker 1.6.2 box2d-py 2.3.5 brax 0.9.1 cached-property 1.5.2 cachetools 5.3.0 certifi 2022.12.7 cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.1.1 chex 0.1.7 click 8.1.3 cloudpickle 2.2.1 cmake 3.25.0 colorlog 6.7.0 comm 0.1.3 command-not-found 0.3 concurrent-log-handler 0.9.20 contourpy 1.0.7 cryptography 3.4.8 cupy-cuda12x 12.0.0 cycler 0.11.0 Cython 0.29.33 dbus-python 1.2.18 debugpy 1.6.6 decorator 4.4.2 defusedxml 0.7.1 dill 0.3.6 distlib 0.3.6 distrax 0.1.3 distro 1.7.0 distro-info 1.1build1 dm-env 1.6 dm-tree 0.1.8 docker-pycreds 0.4.0 efficientnet-pytorch 0.7.1 enum-tools 0.9.0.post1 envpool 0.8.2 etils 1.3.0 evdev 1.6.1 evosax 0.1.4 executing 1.2.0 fasteners 0.15 fastjsonschema 2.16.3 fastrlock 0.8.1 filelock 3.10.4 Flask 2.3.2 Flask-Cors 3.0.10 flax 0.6.10 fonttools 4.39.2 fqdn 1.5.1 frozenlist 1.3.3 fsspec 2023.4.0 gast 0.5.4 gitdb 4.0.10 GitPython 3.1.31 glcontext 2.3.7 glfw 1.12.0 google-auth 2.16.3 google-auth-oauthlib 0.4.6 graphviz 0.20.1 grpcio 1.51.3 gym 0.26.2 gym-notices 0.0.8 gym-super-mario-bros 7.4.0 gym3 0.3.3 gymnasium 0.27.1 gymnasium-notices 0.0.1 gymnax 0.0.6 hbutils 0.8.2 httplib2 0.20.2 huggingface-hub 0.13.2 idna 3.4 imageio 2.26.1 imageio-ffmpeg 0.3.0 importlib-metadata 4.6.4 importlib-resources 5.12.0 iniconfig 2.0.0 ipykernel 6.22.0 ipython 8.11.0 ipython-genutils 0.2.0 ipywidgets 8.0.4 isoduration 20.11.0 isort 5.12.0 itsdangerous 2.1.2 jax 0.4.11 jax-jumpy 1.0.0 jaxlib 0.4.11+cuda12.cudnn88 jaxopt 0.7 jedi 0.18.2 jeepney 0.7.1 Jinja2 3.1.2 joblib 1.2.0 jsonpointer 2.3 jsonschema 4.17.3 jupyter 1.0.0 jupyter_client 8.1.0 jupyter-console 6.6.3 jupyter_core 5.3.0 jupyter-events 0.6.3 jupyter_server 2.5.0 jupyter_server_terminals 0.4.4 jupyterlab-pygments 0.2.2 jupyterlab-widgets 3.0.5 kaggle 1.5.13 keyring 23.5.0 kiwisolver 1.4.4 launchpadlib 1.10.16 lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lazy_loader 0.1 lazy-object-proxy 1.9.0 lit 15.0.7 lz4 4.3.2 Markdown 3.4.3 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 mccabe 0.7.0 mdurl 0.1.2 mistune 2.0.5 ml-dtypes 0.2.0 mlagents-envs 0.30.0 moderngl 5.8.1 monotonic 1.6 more-itertools 8.10.0 moviepy 1.0.3 mpmath 1.2.1 msgpack 1.0.5 mujoco 2.3.5 mujoco-py 2.1.2.14 munch 2.5.0 nbclassic 0.5.6 nbclient 0.7.4 nbconvert 7.3.1 nbformat 5.8.0 nes-py 8.2.1 nest-asyncio 1.5.6 netifaces 0.11.0 networkx 3.0 notebook 6.5.4 notebook_shim 0.2.3 numpy 1.22.4 nvidia-cublas-cu11 11.11.3.6 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvcc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.8.1.3 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nvjitlink-cu12 12.1.105 oauthlib 3.2.0 opencv-contrib-python 4.7.0.72 opencv-python 4.7.0.72 opencv-python-headless 4.7.0.72 opt-einsum 3.3.0 optax 0.1.5 optree 0.9.1 orbax-checkpoint 0.2.6 packaging 23.0 pandas 2.0.0 pandocfilters 1.5.0 parallel-execute 0.1.1 parso 0.8.3 pathtools 0.1.2 PettingZoo 1.22.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.3.0 pip 23.1.2 platformdirs 3.1.1 pluggy 1.0.0 portalocker 2.7.0 pretrainedmodels 0.7.4 procgen 0.10.7 proglog 0.1.10 prometheus-client 0.16.0 prompt-toolkit 3.0.38 protobuf 3.20.3 psutil 5.9.4 ptyprocess 0.7.0 pure-eval 0.2.2 py 1.11.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycparser 2.21 pygame 2.3.0 pygifsicle 1.0.7 pyglet 1.5.21 Pygments 2.14.0 PyGObject 3.42.1 PyJWT 2.3.0 pynput 1.7.6 PyOpenGL 3.1.6 pyparsing 2.4.7 pyrsistent 0.19.3 pytest 7.0.1 python-apt 2.4.0+ubuntu1 python-dateutil 2.8.2 python-json-logger 2.0.7 python-slugify 8.0.1 python-xlib 0.33 pytimeparse 1.1.8 pytinyrenderer 0.0.14 pytorch-triton 2.1.0+440fd1bf20 pytz 2022.7.1 PyWavelets 1.4.1 PyYAML 5.4.1 pyzmq 25.0.2 qtconsole 5.4.2 QtPy 2.3.1 qudida 0.0.4 ray 2.5.0 requests 2.28.1 requests-oauthlib 1.3.1 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.3.2 rsa 4.9 scikit-image 0.20.0 scikit-learn 1.2.2 scipy 1.10.1 SecretStorage 3.3.1 segmentation-models-pytorch 0.3.2 Send2Trash 1.8.2 sentry-sdk 1.17.0 setproctitle 1.3.2 setuptools 59.6.0 six 1.16.0 sklearn 0.0.post1 smmap 5.0.0 sniffio 1.3.0 soupsieve 2.4.1 stack-data 0.6.2 support-developer 1.0.5 swig 4.1.1 sympy 1.11.1 systemd-python 234 tensorboard 2.12.0 tensorboard-data-server 0.7.0 tensorboard-plugin-wit 1.8.1 tensorboardX 2.6 tensorflow-probability 0.20.1 tensorstore 0.1.38 termcolor 2.2.0 terminado 0.17.1 text-unidecode 1.3 threadpoolctl 3.1.0 tifffile 2023.3.15 timm 0.6.12 tinycss2 1.2.1 tomli 2.0.1 tomlkit 0.11.6 toolz 0.12.0 torch 2.1.0.dev20230619+cu121 torch-tb-profiler 0.4.1 torchaudio 2.1.0.dev20230619+cu121 torchvision 0.16.0.dev20230619+cu121 tornado 6.2 tqdm 4.65.0 traitlets 5.9.0 treevalue 1.4.10 trimesh 3.9.35 triton 2.0.0 types-protobuf 4.22.0.0 typing_extensions 4.4.0 tzdata 2023.3 ubuntu-advantage-tools 8001 ufw 0.36.1 unattended-upgrades 0.1 uri-template 1.2.0 urllib3 1.26.13 vec-noise 1.1.4 virtualenv 20.21.0 wadllib 1.3.6 wandb 0.14.0 wcwidth 0.2.6 webcolors 1.13 webencodings 0.5.1 websocket-client 1.5.1 Werkzeug 2.3.6 wheel 0.37.1 widgetsnbextension 4.0.5 wrapt 1.15.0 zipp 1.0.0

Edit: (mattip) formatting

mattip commented 1 year ago

You are using a NVIDIA GeForce RTX 4090 which demands a lot of power. Perhaps your power supply or a connection cable is not up to the task and fails when the GPU is fully loaded. Can you run other GPU intensive programs successfully?

xiezhipeng-git commented 1 year ago

You are using a NVIDIA GeForce RTX 4090 which demands a lot of power. Perhaps your power supply or a connection cable is not up to the task and fails when the GPU is fully loaded. Can you run other GPU intensive programs successfully?

I try it in RTX 1070 machine。it is same error。And I can run https://github.com/sail-sg/envpool/blob/aacf06f694ead2eb75331f085f00dad71eec1a08/examples/cleanrl_examples/ppo_atari_envpool.py#L211 this code .After I change some code.I can solve pong in 59S(20 consecutive average scores greater than 17). Is it GPU intensive programs?

mattip commented 1 year ago

I am not sure I understand. You ran the script above, without any changes (using @ray.remote(num_cpus = 1,num_gpus=1/24)), on two different machines, using windows and ubuntu22.04 and windows-wsl, and it crashed on all of the different machines? It did not run successfully on any machine/os you tried?

xiezhipeng-git commented 1 year ago

发生异常: RayTaskError(RuntimeError) ray::f() (pid=3686, ip=172.28.246.219) File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 48, in f torch.rand(1, 1).to(device)

RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 54, in print(ray.get(futures)) ray.exceptions.RayTaskError(RuntimeError): ray::f() (pid=3686, ip=172.28.246.219) File "/home/xzpwsl2/my/work/rlFrame/rl_frame/jorldy/raytest.py", line 48, in f torch.rand(1, 1).to(device) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I have two machines in total. One is that Windows 10 has installed wsl2 rtx 1070, and the other is 4090 Windows 11 wsl2. I used Windows Python and wsl2 Python to run this program, respectively. So far, there has only been one successful 1/24 attempt to use GPU on 2.3.1 ray. All other attempts have failed. I am currently trying ray2.5.1 again and it has failed again.

On the successful occasion, I also mentioned that in the historical information of this issue

mattip commented 1 year ago

My successful run has a much smaller list of packages installed. Could you try building a new virtualenv as I did at the start of the report above installing only what is needed to run your script (ray and torch)? Perhaps one of the additional packages is messing up the environment. Here is my pip list:

(ray_throwaway) d:\pypy_stuff>pip list
Package            Version
------------------ -----------
aiosignal          1.3.1
attrs              23.1.0
certifi            2023.5.7
charset-normalizer 3.1.0
click              8.1.3
colorama           0.4.6
filelock           3.9.0
frozenlist         1.3.3
grpcio             1.51.3
idna               3.4
Jinja2             3.1.2
jsonschema         4.17.3
MarkupSafe         2.1.2
mpmath             1.2.1
msgpack            1.0.5
networkx           3.0
numpy              1.25.0
packaging          23.1
pip                22.2.2
protobuf           4.23.3
pyrsistent         0.19.3
PyYAML             6.0
ray                2.5.1
requests           2.31.0
setuptools         63.2.0
sympy              1.11.1
torch              2.0.1+cu117
typing_extensions  4.4.0
urllib3            2.0.3
xiezhipeng-git commented 1 year ago

我成功运行的安装软件包列表要小得多。你能尝试像我在上面的报告开头所做的那样构建一个新的 virtualenv 吗,只安装运行脚本所需的内容(射线和火炬)?也许其中一个额外的软件包正在弄乱环境。这是我的:pip list

(ray_throwaway) d:\pypy_stuff>pip list
Package            Version
------------------ -----------
aiosignal          1.3.1
attrs              23.1.0
certifi            2023.5.7
charset-normalizer 3.1.0
click              8.1.3
colorama           0.4.6
filelock           3.9.0
frozenlist         1.3.3
grpcio             1.51.3
idna               3.4
Jinja2             3.1.2
jsonschema         4.17.3
MarkupSafe         2.1.2
mpmath             1.2.1
msgpack            1.0.5
networkx           3.0
numpy              1.25.0
packaging          23.1
pip                22.2.2
protobuf           4.23.3
pyrsistent         0.19.3
PyYAML             6.0
ray                2.5.1
requests           2.31.0
setuptools         63.2.0
sympy              1.11.1
torch              2.0.1+cu117
typing_extensions  4.4.0
urllib3            2.0.3

I don't think it's the mutual influence of the environment. Because my computer is brand new. The first project was to create this ray program. I only handwritten this test program after discovering a memory overflow. I think you should focus on trying out the compatibility between the new Cuda Python version and ray

My successful run has a much smaller list of packages installed. Could you try building a new virtualenv as I did at the start of the report above installing only what is needed to run your script (ray and torch)? Perhaps one of the additional packages is messing up the environment. Here is my pip list:

(ray_throwaway) d:\pypy_stuff>pip list
Package            Version
------------------ -----------
aiosignal          1.3.1
attrs              23.1.0
certifi            2023.5.7
charset-normalizer 3.1.0
click              8.1.3
colorama           0.4.6
filelock           3.9.0
frozenlist         1.3.3
grpcio             1.51.3
idna               3.4
Jinja2             3.1.2
jsonschema         4.17.3
MarkupSafe         2.1.2
mpmath             1.2.1
msgpack            1.0.5
networkx           3.0
numpy              1.25.0
packaging          23.1
pip                22.2.2
protobuf           4.23.3
pyrsistent         0.19.3
PyYAML             6.0
ray                2.5.1
requests           2.31.0
setuptools         63.2.0
sympy              1.11.1
torch              2.0.1+cu117
typing_extensions  4.4.0
urllib3            2.0.3

I don't think it's the mutual influence of the environment. Because my computer is brand new. The first project was to create this ray program. I only handwritten this test program after discovering a memory overflow. I think you should focus on trying out the compatibility between the new Cuda Python version and ray

mattip commented 1 year ago

I think you should focus on trying out the compatibility between the new Cuda Python version and ray

I need your help, since only on your machines does the crash happen. Can you tell me what happens if you do these steps:

>python -m venv \temp\ray_throwaway
>\temp\ray_throwaway\Scripts\activate
(ray_throwaway) python -m pip install torch --index-url https://download.pytorch.org/whl/cu117
(ray_throwaway) python -m pip install ray==2.5.1
(ray_throwaway) python test_ray.py
xiezhipeng-git commented 1 year ago

python -m venv \temp\ray_throwaway \temp\ray_throwaway\Scripts\activate (ray_throwaway) python -m pip install torch --index-url https://download.pytorch.org/whl/cu117 (ray_throwaway) python -m pip install ray==2.5.1 (ray_throwaway) python test_ray.py

CUDA_VISIBLE_DEVICES 0 2023-06-29 20:07:35,238 INFO worker.py:1636 -- Started a local Ray instance. (f pid=29984) @ray.remote里: cuda 2 (f pid=29984) tensor([[0.9769]], device='cuda:0') [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801] 用时16.433708429336548 win11 4090 use cuda 117 it can success. and windows10 1070cuda117 also can success. It seems has error with torch cuda 118 and cuda121 WHL

mattip commented 1 year ago

It seems has error with torch cuda 118 and cuda121 WHL

Thanks, that is progress!

Edit: I think I can try the newer wheels from https://pytorch.org/get-started/locally/

xiezhipeng-git commented 1 year ago

It seems has error with torch cuda 118 and cuda121 WHL

Thanks, that is progress!

Edit: I think I can try the newer wheels from https://pytorch.org/get-started/locally/

windows python success stable 118 and 117 .but wsl2 cu117 118 121 also cuda memory error.So how can I do something to slove it on wsl2?

xiezhipeng-git commented 1 year ago

I make a new python .venv on wsl2. It is also cuda memory error. Besides https://github.com/sail-sg/envpool/blob/aacf06f694ead2eb75331f085f00dad71eec1a08/examples/cleanrl_examples/ppo_atari_envpool.py#L211 The running speed of this code has decreased significantly.Because I use Nightly version before today.The Nightly version is much faster than the standard version. Then, in the vast majority of cases, I work with wsl2. Because envpool currently only has a Linux version.

xiezhipeng-git commented 1 year ago

Add some information. My computer's memory is 32GB * 2, and can only recognize 32GB?

> ray status
Resources
---------------------------------------------------------------
Usage:
 32.0/32.0 CPU
 0.9984/1.0 GPU
 0B/32.10GiB memory
 2.70MiB/16.05GiB object_store_memory
xiezhipeng-git commented 1 year ago

update ray to 2.6.0. This problem still exists on wsl2.

And I found when in windows python. Use ray.init(num_gpus=1) can work Use ray.init(num_gpus=-1) can not work @ray.remote(num_cpus=1,num_gpus=oneGpuNeed*1) torch.cuda.device_count()=1 str(list(range(localGpu_num))).strip('[]') = 0 os.environ['CUDA_VISIBLE_DEVICES'] = 0 Can this problem be solved urgently. It is a necessary crash bug and involves all Windows computers using WSL.

mattip commented 10 months ago

This is happening in wsl2? If you use windows (not wls2) it does not happen?

jbohnslav commented 9 months ago

Hi, I'm not sure if this is fixed or was just closed due to inactivity. I'm also running in WSL2.

My code:

>>> import ray                                                                                                                                                                                                             
>>> ray.init()                                                                                                                                                                                                             2023-12-20 03:25:04,767 INFO worker.py:1673 -- Started a local Ray instance.                                                                                                                                               
RayContext(dashboard_url='', python_version='3.10.12', ray_version='2.8.1', 
ray_commit='82a8df138fe7fcc5c42536ebf26e8c3665704fee', protocol_version=None)                                                                  
>>> ray.get_gpu_ids()                                                                                                                                                                                                      
[]                  

Your test script:

import ray
@ray.remote(num_gpus=1)
def f():
    import os
    print(os.environ.get("CUDA_VISIBLE_DEVICES"))
print(ray.get(f.remote()))

Result of test script:

2023-12-20 03:30:24,028 INFO worker.py:1673 -- Started a local Ray instance.                                                                                                                                               
(autoscaler +6s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.                                                                                             
(autoscaler +6s) Error: No available node types can fulfill resource request {'CPU': 1.0, 'GPU': 1.0}. Add suitable node types to 
this cluster to resolve this issue.                                                      
(autoscaler +41s) Error: No available node types can fulfill resource request {'CPU': 1.0, 'GPU': 1.0}. Add suitable node types to this cluster to resolve this issue.                                                    
(autoscaler +1m16s) Error: No available node types can fulfill resource request {'CPU': 1.0, 'GPU': 1.0}. Add suitable node types to this cluster to resolve this issue.      

Nvidia-smi

Wed Dec 20 03:37:22 2023                                                                                                                                                                                                   +---------------------------------------------------------------------------------------+                                                                                                                                  
| NVIDIA-SMI 545.23.06              Driver Version: 545.92       CUDA Version: 12.3     |                                                                                                                                  
|-----------------------------------------+----------------------+----------------------+                                                                                                                                  
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |                                                                                                                                  
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |                                                                                                                                  
|                                         |                      |               MIG M. |                                                                                                                                  
|=========================================+======================+================
======|                                                                                                                                  |   0  NVIDIA GeForce RTX 3090        
On  | 00000000:0A:00.0  On |                  N/A |                                                                                                                                  |  
0%   43C    P8              35W / 350W |   1202MiB / 24576MiB |      5%      Default |                                                                                                                                  
|                                         |                      |                  N/A |                                                                                                                                  
+-----------------------------------------+----------------------+----------------------+                                                                                                                                  
|   1  NVIDIA GeForce RTX 3090        On  | 00000000:0B:00.0 Off |                  N/A |                                                                                                                                  
|  0%   30C    P8               8W / 350W |     47MiB / 24576MiB |      0%      Default |                                                                                                                                  
|                                         |                      |                  N/A |                                                                                                                                  
+-----------------------------------------+----------------------+----------------------+                                                                                                                                                                                                                                                                                                                                                             
+---------------------------------------------------------------------------------------+                                                                                                                                  
| Processes:                                                                            |                                                                                                                                  
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |                                                                                                                                  
|        ID   ID                                                             Usage      |                                                                                                                                  
|=================================================================================
======|                                                                                                                                  |    0   N/A  N/A        20      G   
/Xwayland                                 N/A      |                                                                                                                                  |    0   
N/A  N/A        20      G   /Xwayland                                 N/A      |                                                                                                                                  
|    0   N/A  N/A        34      G   /Xwayland                                 N/A      |                                                                                                                                  
|    1   N/A  N/A        20      G   /Xwayland                                 N/A      |                                                                                                                                  
|    1   N/A  N/A        20      G   /Xwayland                                 N/A      |                                                                                                                                  
|    1   N/A  N/A        34      G   /Xwayland                                 N/A      |                                                                                                                                  
+---------------------------------------------------------------------------------------+