Closed chamecall closed 5 years ago
This is weird. training should be conducted on GPU if you installed keras-gpu. Does it properly use GPU with other DNN training using keras?
This is weird. training should be conducted on GPU if you installed keras-gpu.
As I found out there's no keras-gpu as such. There's different backends with GPU support or without. So I installed tensorflow-gpu.
Does it properly use GPU with other DNN training using keras? I didn't launch DNN using keras yet
@kentaroy47 my training output
`Using TensorFlow backend. WARNING: Logging before flag parsing goes to stderr. W0718 15:39:21.993449 140716638869312 deprecation_wrapper.py:119] From train_frcnn.py:22: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
W0718 15:39:21.993684 140716638869312 deprecation_wrapper.py:119] From train_frcnn.py:24: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2019-07-18 15:39:22.005688: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-18 15:39:22.010477: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-18 15:39:22.090010: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-18 15:39:22.090624: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x518d5b0 executing computations on platform CUDA. Devices:
2019-07-18 15:39:22.090642: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-07-18 15:39:22.111158: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192845000 Hz
2019-07-18 15:39:22.111430: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5581ce0 executing computations on platform Host. Devices:
2019-07-18 15:39:22.111452: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0):
W0718 15:39:22.769683 140716638869312 deprecation_wrapper.py:119] From /home/algernon/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0718 15:39:22.778257 140716638869312 deprecation_wrapper.py:119] From /home/algernon/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
2019-07-18 15:39:22.814931: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. W0718 15:39:22.816322 140716638869312 deprecation_wrapper.py:119] From /home/algernon/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
W0718 15:39:24.377749 140716638869312 deprecation_wrapper.py:119] From /home/algernon/frcnn-from-scratch-with-keras/keras_frcnn/RoiPoolingConv.py:105: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
W0718 15:39:26.038717 140716638869312 deprecation_wrapper.py:119] From /home/algernon/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3980: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
loading weights from ./pretrain/mobilenet_1_0_224_tf.h5 W0718 15:39:27.599630 140716638869312 deprecation_wrapper.py:119] From /home/algernon/.local/lib/python3.6/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
W0718 15:39:27.612688 140716638869312 deprecation.py:323] From /home/algernon/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.
etc..
it seems that my cuda version doesn't match tf version. isn't it?
yes, problem was in CUDA-10.1. I downgraded to CUDA-10.1 and it successfully launched on the GPU.
for example I launched the training in the next way:
python3 train_frcnn.py --network mobilenetv1 -p ./VOCdevkit
after that executed nvidia-smi and didn't see any gpu memory consumption:`+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A | | 0% 55C P0 N/A / 72W | 287MiB / 4032MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 689 G /usr/lib/xorg/Xorg 194MiB | | 0 2797 G /proc/self/exe 44MiB | | 0 6001 G /usr/lib/firefox/firefox 1MiB | | 0 9034 C python3 43MiB | +-----------------------------------------------------------------------------+` strictly speaking how can I launch a training on the GPU?