thangvubk / SoftGroup

[CVPR 2022 Oral] SoftGroup for Instance Segmentation on 3D Point Clouds
MIT License
346 stars 81 forks source link

process killed by computer #132

Open DurbinLiu opened 2 years ago

DurbinLiu commented 2 years ago

Hello, when I run the command ./tools/dist_train.sh configs/softgroup_scannet.yaml 1 I met the following problem: my process got killed by my computer after running several epochs. image I searched for the issue, and found it was caused by oom. I was using the single 3090 GPU, and set batchsize=4, num_workers=4, and I think it shouldd't cause out of memory, noting that it can run some epochs. Do u konw why and how to deal with the issue? Hoping for your reply, many thanks!

wsk12345 commented 2 years ago

I also encountered the same problem. Running the test on a single RTX3090 24GB shows cuda error: an illegal memory access was encountered.

thangvubk commented 2 years ago

I am not very sure. Could you check with --skip-validation flag. You can also resume training with --resume

wsk12345 commented 2 years ago

Thank you for your advice. But I have no intention of training and want to test on point cloud data.

thangvubk commented 2 years ago

@wsk12345 which dataset are you using

wsk12345 commented 2 years ago

@thangvubk Thank you for your prompt reply. I am using a custom dataset. A single scene has about 5e6 points.

KtK99 commented 2 years ago

I ran into "illegal memory access error", it was caused by the radius being too large for the dataset I was training on, it may also have the same effect while testing

thangvubk commented 2 years ago

If you have memory errors with custom datasets, i suggest checking the input spatial_shape. Spconv2 may not support the input with too large spatial shapes (e.g., 3000x3000x1000).

Krupal09 commented 2 years ago

I am getting the OOM error while testing for S3DIS dataset on a single RTX6000.