Closed kexuedaishu closed 1 year ago
Could you provide the full traceback details?
Traceback (most recent call last):
File "/mnt/nfs/yijing.zhou/venv_38/lib/python3.8/site-packages/mars/services/cluster/uploader.py", line 114, in upload_node_info
self._info.env = await asyncio.to_thread(gather_node_env)
File "/mnt/nfs/yijing.zhou/venv_38/lib/python3.8/site-packages/mars/lib/aio/_threads.py", line 36, in to_thread
return await loop.run_in_executor(None, func_call)
File "/mnt/nfs/zhihui/anaconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/mnt/nfs/yijing.zhou/venv_38/lib/python3.8/site-packages/mars/services/cluster/gather.py", line 73, in gather_node_env
cuda_info = mars_resource.cuda_info()
File "/mnt/nfs/yijing.zhou/venv_38/lib/python3.8/site-packages/mars/resource.py", line 352, in cuda_info
products=[nvutils.get_device_info(idx).name for idx in range(gpu_count)],
File "/mnt/nfs/yijing.zhou/venv_38/lib/python3.8/site-packages/mars/resource.py", line 352, in
Which Mars version did you use?
0.9.0a1
0.9.0a1
Can you try the latest version?
yes, actually the older version 0.6.1 was used, but have to upgrade to the newer version and encountered such errors...
Latest version v0.10.0 may address your issue, could you try it please?
I just tried, the latest version v0.10.0 doesn't work as well.
Looks like sth went wrong when getting info from cuda device, can you give more info about your cuda setting?
CUDA Version 9.0.176 CUDA Patch Version 9.0.176.1 CUDA Patch Version 9.0.176.2 CUDA Patch Version 9.0.176.3 CUDA Patch Version 9.0.176.4
I try to run the function on my computer with 2 GPU cards, but no error occurs.
>>> from mars.lib.nvutils import get_device_info
>>> get_device_info(0)
_cu_device_info(index=0, uuid=UUID('096adb53-fc18-1075-1097-5792e93051b7'), name='NVIDIA Graphics Device', multiprocessors=84, cuda_cores=16128, threads=129024)
Could you try this one?
>>> from mars.lib.nvutils import get_device_info
>>> get_device_info(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/***/pkgs/mars-0.10.0/mars/lib/nvutils.py", line 343, in get_device_info
uuid=uuid.UUID(bytes=uuid_t.bytes),
File "/****/anaconda3/lib/python3.8/uuid.py", line 180, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Code:
from mars.session import new_session new_session().as_default()
Get error: anaconda3/lib/python3.8/uuid.py", line 180, in init raise ValueError('bytes is not a 16-char string') ValueError: bytes is not a 16-char string
Error under Linux version 3.10.0-957.10.1.el7.x86_64.