tsinghua-rll / VoxelNet-tensorflow

A 3D object detection system for autonomous driving.
MIT License
453 stars 123 forks source link

GPU usage #6

Closed MilPat555 closed 6 years ago

MilPat555 commented 6 years ago

Hey,

So I am trying to train on a single GPU of 11GB. In the config.py file I changed __C.GPU_AVAILABLE = '3,1,2,0' to __C.GPU_AVAILABLE = '0'

Due to your multiprocessing which you have implemented it breaks by processes up in 16 parts of only 58MB each. Though I have plenty more GPU available, it does not seem to access it.

Do you know how I can solve this?

image

MilPat555 commented 6 years ago

I have tried increasing the batch_size and playing with the value for use_multi_process_num for KittiLoader, but not sure why each process is limited to 58mb still. Processes seem to be put on the CPU for some reason.

jeasinema commented 6 years ago

It is not so strange that each process consumes only a little memory. In fact, I've tried to run 1 batch in single TITAN X, it only cost about 109MB for each process. As for you suspect that the computation may be put on CPU, I'm not sure about that since I do not have more information, but I suggest that you can have a look on the GPU-Util of the GPU you use to make sure if the model utilizing the GPU.

bw4sz commented 6 years ago

@MilPat555, did you make any other alterations for single GPU. I also changed to

__C.GPU_AVAILABLE = '0'

and it grabs it just fine

2018-04-17 18:03:26.479067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:84:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB

but I get an error later

Traceback (most recent call last):
  File "/home/b.weinstein/voxelnet/train.py", line 200, in <module>
    tf.app.run(main)
  File "/home/b.weinstein/miniconda3/envs/voxelnet/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/b.weinstein/voxelnet/train.py", line 129, in main
    batch = sample_test_data(val_dir, args.single_batch_size * cfg.GPU_USE_COUNT, multi_gpu_sum=cfg.GPU_USE_COUNT)
  File "/home/b.weinstein/voxelnet/utils/kitti_loader.py", line 140, in sample_test_data
    _, per_vox_feature, per_vox_number, per_vox_coordinate = build_input(voxel[idx * single_batch_size:(idx + 1) * single_batch_size])
  File "/home/b.weinstein/voxelnet/utils/kitti_loader.py", line 172, in build_input
    feature = np.concatenate(feature_list)
ValueError: need at least one array to concatenate

which suggests to me it assume that there atleast two GPUs. It seems to work fine debugging on CPU.

I'm using a fork on this repo, still working back through the PR to see if there has been a change upstream.

bw4sz commented 6 years ago

Followup here, this was caused by misnaming the validation directory.