skyhehe123 / SA-SSD

SA-SSD: Structure Aware Single-stage 3D Object Detection from Point Cloud (CVPR 2020)
492 stars 106 forks source link

CUDA OOM Error #14

Open ootts opened 4 years ago

ootts commented 4 years ago

I am trying to train the SA-SSD model, but I encounter with CUDA out-of-memory error. I tried to use batch size 1, but the OOM error remains. I am using TITAN V with 12.7G memory, and my pytorch version is 1.2.0.

stalkermustang commented 4 years ago

You need to install torch 1.1 btw, but i create a dockerfile which i use to train model - enen on 2060 6gb with bs 1. Also found that on v100 memory usage growths - about 15GB with default settings. @skyhehe123 , did you see this fact before? I didn't know why memory change across GPUs.

ootts commented 4 years ago

You need to install torch 1.1 btw, but i create a dockerfile which i use to train model - enen on 2060 6gb with bs 1. Also found that on v100 memory usage growths - about 15GB with default settings. @skyhehe123 , did you see this fact before? I didn't know why memory change across GPUs.

I install pytorch 1.1.0, but memory is still not enough..

stalkermustang commented 4 years ago

Do you build DOcker image and try train inside?

ootts commented 4 years ago

Do you build DOcker image and try train inside?

I use the DOcker image as a reference because it is slightly different from my environment. I install spconv using the commands in the DOcker image, and install pytorch1.1.0 and torchvision0.3.0. But OOM still remains.

ootts commented 4 years ago

I think I find the problem. It has nothing to do with this repo but spconv, which has bugs on TITAN V GPU. TITAN XP is fine.