tusen-ai / simpledet

A Simple and Versatile Framework for Object Detection and Instance Recognition
Apache License 2.0
3.08k stars 486 forks source link

使用两块卡训练报错 #291

Closed dongzhenguo2016 closed 4 years ago

dongzhenguo2016 commented 4 years ago

我使用两块2080Ti训练Tridentnet_r50v1c4_c5_1x,训练自己的数据集。 CUDA版本是10.0.13,cudnn版本是7.4.1 安装的Mxnet的版本是:mxnet_cu100-1.6.0b20191214-py2.py3-none-manylinux1_x86_64.whl 设置的使用的显卡编号为: gpus = [2,3] batch_image 已经改为1了:batch_image = 1 if is_train else 1 还是一运行训练代码就报错out of memory raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [14:27:50] src/storage/./pinned_memory_storage.h:62: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: out of memory 谢谢大佬能指正

xchani commented 4 years ago

Could you please try another config, like faster_r50v1_fpn_1x? Does the same error occur?

dongzhenguo2016 commented 4 years ago

Could you please try another config, like faster_r50v1_fpn_1x? Does the same error occur? 必须得从第0块卡开始使用显卡,比如用第0块卡,或者第0和1块,或者第0和1和2块卡,反正必须从第0块卡开始才能运行起来。很奇怪

sjtuytc commented 4 years ago

Could you please try another config, like faster_r50v1_fpn_1x? Does the same error occur? 必须得从第0块卡开始使用显卡,比如用第0块卡,或者第0和1块,或者第0和1和2块卡,反正必须从第0块卡开始才能运行起来。很奇怪

都是这样的

RogerChern commented 4 years ago

Does set the environment variable CUDA_VISIBLE_DEVICES help?

On Mon, Mar 23, 2020 at 4:22 PM sjtuytc notifications@github.com wrote:

Could you please try another config, like faster_r50v1_fpn_1x? Does the same error occur? 必须得从第0块卡开始使用显卡,比如用第0块卡,或者第0和1块,或者第0和1和2块卡,反正必须从第0块卡开始才能运行起来。很奇怪

都是这样的

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/291#issuecomment-602449819, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGODH4YGCFBRJHGNB25QADRI4L27ANCNFSM4KGNZX7Q .