ufoym / deepo

Setup and customize deep learning environment in seconds.
http://ufoym.com/deepo
MIT License
6.32k stars 750 forks source link

ufoym/deepo:pytorch-py36-cu101 cuda.max_memory_allocated crash #136

Closed qrsforever closed 2 years ago

qrsforever commented 4 years ago
>>> import torch
>>> torch.__version__
'1.6.0.dev20200607+cu101'
>>> from torch.cuda import max_memory_allocated
>>> max_memory_allocated(0)
Segmentation fault (core dumped)
qrsforever commented 4 years ago

it's my fault, i run docker without --runtime nvidia.

qrsforever commented 4 years ago

reopen, it's not caused by missing the "--runtime nvidia", but caused by "torch" with version 1.6

torch1.5:

>>> import torch
>>> torch.__version__
'1.5.0.dev20200319'
>>> torch.version.cuda
'10.1'
>>> torch.cuda.max_memory_reserved(0)
0
>>> 

torch 1.6:

>>> import torch
>>> torch.__version__
'1.6.0.dev20200609+cu101'
>>> torch.version.cuda
'10.1'
>>> torch.cuda.max_memory_reserved(0)
Segmentation fault (core dumped)
ufoym commented 2 years ago

Should be OK @ latest deepo images:

>>> import torch
>>> torch.__version__
'1.11.0.dev20211224+cu111'
>>> torch.version.cuda
'11.1'
>>> torch.cuda.max_memory_reserved(0)
'0'