thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Darkflow not using all GPU #1062

Open pradeepbhaskar opened 5 years ago

pradeepbhaskar commented 5 years ago

Hi,

I have a 8 GPU linux box. I have been trying a lot of the google solutions to get the darkflow process to use all the GPUs, but I am seeing that it uses only 1 GPU and picks 'GPU:0' by default. The log is shown below.

The command line parameters are : --save 24000 --epoch 2000 --batch 8 --lr 0.0001 --gpu 0.8 --gpuName '/gpu:2' --savepb

Tried changing the build.py for the 4 lines, but still no luck:

            config_TF.gpu_options.allow_growth = True
            config_TF.gpu_options.per_process_gpu_memory_fraction = 0.8
                   'allow_soft_placement': True,
                   'log_device_placement': True

Should the parameter be like :

--gpuName '/gpu:0', '/gpu:1', '/gpu:2' OR --gpuName '/gpu:0, /gpu:1, /gpu:2'

I strongly feel that there is something to do with the build.py file on the below code, can someone please help.

            self.say('\nBuilding net ...')
            start = time.time()
            self.graph = tf.Graph()
            device_name = FLAGS.gpuName \
                    if FLAGS.gpu > 0.0 else None
            with tf.device(device_name):
                    with self.graph.as_default() as g:
                            self.build_forward()
                            self.setup_meta_ops()
            self.say('Finished in {}s\n'.format(
                    time.time() - start))

thanks, Pradeep

ubuntu@ip-10-35-130-44:~/darkflow$ nvidia-smi Wed Jul 17 17:46:41 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 00000000:00:17.0 Off | 0 | | N/A 70C P0 154W / 149W | 9360MiB / 11441MiB | 97% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 On | 00000000:00:18.0 Off | 0 | | N/A 49C P0 72W / 149W | 71MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 On | 00000000:00:19.0 Off | 0 | | N/A 64C P0 62W / 149W | 71MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 On | 00000000:00:1A.0 Off | 0 | | N/A 56C P0 71W / 149W | 71MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla K80 On | 00000000:00:1B.0 Off | 0 | | N/A 65C P0 62W / 149W | 71MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla K80 On | 00000000:00:1C.0 Off | 0 | | N/A 51C P0 73W / 149W | 71MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla K80 On | 00000000:00:1D.0 Off | 0 | | N/A 66C P0 58W / 149W | 71MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla K80 On | 00000000:00:1E.0 Off | 0 | | N/A 55C P0 71W / 149W | 71MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 8951 C python 9347MiB | | 1 8951 C python 58MiB | | 2 8951 C python 58MiB | | 3 8951 C python 58MiB | | 4 8951 C python 58MiB | | 5 8951 C python 58MiB | | 6 8951 C python 58MiB | | 7 8951 C python 58MiB | +-----------------------------------------------------------------------------+