thanks! how to train fasterrcnn with multiple GPUs on a server?

npeirson commented 6 years ago

I could be wrong, but I believe you can specify distribution parameters. I don't know for local, but for training on the google cloud, you'd just tack this on to your command line: --worker-count (number of GPUs, or greater if you wish to have more than one worker per GPU)

zzk88862 commented 6 years ago

ok, thanks your answer, What I mean is how to run a single server with multi-card training, not distributed training

npeirson commented 6 years ago

I've only just started exploring Luminoth, but since it's still alpha I'm going to guess you'll need to interact directly with Tensorflow to do that. That being said, I don't think it's terribly difficult; pretty much replacing your single CPU or GPU call with a for gpu in [gpu-1, gpu-2, gpu-3, ... gpu-n]: or similar call. Check out this page for an example.

zzk88862 commented 6 years ago

okay, thanks for your answer, I will try it

zzk88862 commented 6 years ago

thank your advice, i have tried multiple gpu by with tf.device('/gpu:i'%i), but it always hits 已杀死， i have debugged many times, but it not solved! followings are some my run messages

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 24215 C /root/anaconda3/envs/new52/bin/python 4369MiB | | 1 24215 C /root/anaconda3/envs/new52/bin/python 4863MiB | +-----------------------------------------------------------------------------+

2、run result

tryolabs / luminoth

thanks! how to train fasterrcnn with multiple GPUs on a server? #172