vlfeat / matconvnet

MatConvNet: CNNs for MATLAB
Other
1.4k stars 753 forks source link

Write usage of MultiGPU? #147

Closed lifelongeek closed 8 years ago

lifelongeek commented 9 years ago

Hello all. I am using recent versions of matconvnet (downloaded 3days ago). First of all, thanks for letting us use this convenient library. While I have been using matconvnet, I have some doubt about multiGPU implementation.

[What I did] In executive script, I just change opts.train.gpus as [1 2] since I have 2GPUs(both NVIDIA GTX780-Ti) on my machine. And run script to train CNN for my own dataset.

[My question] Question1)
From the training progress shown on command line, I thought my CNN is trained with data-paralleism(64minibatch for each GPU).

Lab 1: training: epoch 25: batch 2056/4375: 0.35 s (368.9 data/s) obj:0.0472 top1e:0.0128 top5e:0.0006 [64/128] training: epoch 25: batch 2057/4375: 0.35 s (368.8 data/s) obj:0.0472 top1e:0.0128 top5e:0.0006 [64/128] training: epoch 25: batch 2058/4375: 0.35 s (368.6 data/s) obj:0.0472 top1e:0.0128 top5e:0.0006 [64/128] Lab 2: training: epoch 25: batch 2056/4375: 0.35 s (368.9 data/s) obj:0.044 top1e:0.0123 top5e:0.000456 [64/128] training: epoch 25: batch 2057/4375: 0.35 s (368.8 data/s) obj:0.044 top1e:0.0123 top5e:0.000456 [64/128] training: epoch 25: batch 2058/4375: 0.35 s (368.6 data/s) obj:0.044 top1e:0.0122 top5e:0.000456 [64/128]

But nvidia-smi tells me that only one GPU is used. +------------------------------------------------------+ | NVIDIA-SMI 340.29 Driver Version: 340.29 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 780 Ti Off | 0000:01:00.0 N/A | N/A | | 62% 82C P0 N/A / N/A | 2653MiB / 3071MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 780 Ti Off | 0000:03:00.0 N/A | N/A | | 26% 42C P8 N/A / N/A | 11MiB / 3071MiB | N/A Default |

Is this right result for nvidia-smi? Or did I miss something to execute code as multiGPU fashion?

Question2) In cnn_imagenet_mgpu.m, it call cnn_train_mgpu function. But I can't see any function like that. Is it integrated into cnn_train.m?

Thank you :)

mingtop commented 9 years ago

opts.train.gpus = [ 1,2,3] ...

matconvnet implements the train by spmd

jessicaloohw commented 7 years ago

Hi @gmkim90 - did you figure out if MatConvNet was using one or two GPUs? I am facing the same issue now where I am training on two GPUs but nvidia-smi shows only one GPU being used. Thanks!