you359 / Keras-FasterRCNN

keras implementation of Faster R-CNN
MIT License
334 stars 216 forks source link

Unable to train using GPU? #21

Open wyjun0418 opened 5 years ago

wyjun0418 commented 5 years ago

/home/wangyongjun/iPSCs_Image_Analysis/venv_conda/bin/python /home/wangyongjun/iPSCs_Image_Analysis/iPSCs_detection/train_frcnn.py Using TensorFlow backend. 2019-07-02 14:24:19.841527: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-02 14:24:19.877597: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100075000 Hz 2019-07-02 14:24:19.880423: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5604d4ba2550 executing computations on platform Host. Devices: 2019-07-02 14:24:19.880495: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-07-02 14:24:19.882778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-07-02 14:24:20.412817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:02:00.0 2019-07-02 14:24:20.413880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:03:00.0 2019-07-02 14:24:20.414813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:82:00.0 2019-07-02 14:24:20.415591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:83:00.0 2019-07-02 14:24:20.415749: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:20.415816: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:20.415877: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:20.415936: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:20.415994: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:20.416064: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:20.416121: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory 2019-07-02 14:24:20.416130: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-07-02 14:24:20.416272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-02 14:24:20.416282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2 3 2019-07-02 14:24:20.416288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y N N 2019-07-02 14:24:20.416293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N N N 2019-07-02 14:24:20.416298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2: N N N Y 2019-07-02 14:24:20.416303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3: N N Y N 2019-07-02 14:24:20.431609: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5604d68a4710 executing computations on platform CUDA. Devices: 2019-07-02 14:24:20.431633: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): TITAN Xp, Compute Capability 6.1 2019-07-02 14:24:20.431641: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): TITAN Xp, Compute Capability 6.1 2019-07-02 14:24:20.431648: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (2): TITAN Xp, Compute Capability 6.1 2019-07-02 14:24:20.431654: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (3): TITAN Xp, Compute Capability 6.1 [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 306929952179397883 , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 2004709365230408661 physical_device_desc: "device: XLA_CPU device" , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 4768577020553090356 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_GPU:1" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 8158160402508824678 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_GPU:2" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 2994857638638946823 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_GPU:3" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 9148773670435188 physical_device_desc: "device: XLA_GPU device" ] Parsing annotation files Processing 10_1.xml: 100%|██████████| 3352/3352 [00:00<00:00, 4708.23it/s] Training images per class: {'Bad': 1827, 'Good': 3266, 'Medium': 2349, 'bg': 0} Num classes (including bg) = 4 Config has been written to config.pickle, and can be loaded when testing to ensure correct results Num train samples 1343 Num val samples 1001 Num test samples 1008 WARNING: Logging before flag parsing goes to stderr. W0702 14:24:21.169176 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:47: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0702 14:24:21.169825 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:351: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0702 14:24:21.173986 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3176: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0702 14:24:21.204720 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3043: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

W0702 14:24:22.611183 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3153: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

W0702 14:24:22.656112 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/iPSCs_detection/keras_frcnn/RoiPoolingConv.py:108: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

W0702 14:24:23.809847 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3045: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

W0702 14:24:23.817028 140049300752128 deprecation.py:506] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:1064: calling reduce_prod_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead loading weights from resnet50_weights_tf_dim_ordering_tf_kernels.h5 Could not load pretrained model weights. Weights can be found in the keras application folder https://github.com/fchollet/keras/tree/master/keras/applications W0702 14:24:23.891306 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/optimizers.py:675: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0702 14:24:23.905168 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:2642: The name tf.log is deprecated. Please use tf.math.log instead.

W0702 14:24:23.909163 140049300752128 deprecation.py:323] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where W0702 14:24:23.914963 140049300752128 deprecation.py:506] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:1046: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead 2019-07-02 14:24:24.097984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:02:00.0 2019-07-02 14:24:24.099081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:03:00.0 2019-07-02 14:24:24.100209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:82:00.0 2019-07-02 14:24:24.101056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:83:00.0 2019-07-02 14:24:24.101211: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:24.101282: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:24.101395: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:24.101474: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:24.101539: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:24.101603: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory 2019-07-02 14:24:24.101686: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory 2019-07-02 14:24:24.101696: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-07-02 14:24:24.101857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-02 14:24:24.101868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2 3 2019-07-02 14:24:24.101874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y N N 2019-07-02 14:24:24.101879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N N N 2019-07-02 14:24:24.101885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2: N N N Y 2019-07-02 14:24:24.101890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3: N N Y N 2019-07-02 14:24:24.747706: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. W0702 14:24:24.847111 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/callbacks.py:646: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

W0702 14:24:24.847523 140049300752128 deprecation_wrapper.py:119] From /home/wangyongjun/iPSCs_Image_Analysis/venv_conda/lib/python3.7/site-packages/keras/callbacks.py:649: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

mervekaya commented 5 years ago

i can able to train with GPU through adding fallow lines in train_frcnn.py.

import os 
os.environ["CUDA_VISIBLE_DEVICES"]="0"
raajprakash commented 4 years ago

i can able to train with GPU through adding fallow lines in train_frcnn.py.

import os 
os.environ["CUDA_VISIBLE_DEVICES"]="0"

How about if I have 3 GPU os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2" doesnt work

wanghuogen commented 4 years ago

You need install tensorflow-gpu, but not tensorflow.