vanhuyz / CycleGAN-TensorFlow

An implementation of CycleGan using TensorFlow
MIT License
1.19k stars 436 forks source link

error run train in gpu #106

Open biexiangduo opened 5 years ago

biexiangduo commented 5 years ago

cuda:8.0 cudnn:6.0 python:3.5 tensorflow:1.14.0 ~/CycleGAN-TensorFlow$ python3 train.py WARNING: Logging before flag parsing goes to stderr. W0630 08:35:53.787037 140031590201088 deprecation_wrapper.py:119] From train.py:135: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

W0630 08:35:53.796201 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/model.py:49: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

W0630 08:35:53.797244 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/model.py:58: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0630 08:35:53.798269 140031590201088 deprecation.py:323] From /home/pf/CycleGAN-TensorFlow/reader.py:19: TFRecordReader.init (from tensorflow.python.ops.io_ops) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.TFRecordDataset. W0630 08:35:53.799142 140031590201088 deprecation.py:323] From /home/pf/CycleGAN-TensorFlow/reader.py:28: string_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...). W0630 08:35:53.803668 140031590201088 deprecation.py:323] From /home/pf/.local/lib/python3.5/site-packages/tensorflow/python/training/input.py:278: input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...). W0630 08:35:53.804200 140031590201088 deprecation.py:323] From /home/pf/.local/lib/python3.5/site-packages/tensorflow/python/training/input.py:190: limit_epochs (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensors(tensor).repeat(num_epochs). W0630 08:35:53.805154 140031590201088 deprecation.py:323] From /home/pf/.local/lib/python3.5/site-packages/tensorflow/python/training/input.py:199: QueueRunner.init (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. W0630 08:35:53.805793 140031590201088 deprecation.py:323] From /home/pf/.local/lib/python3.5/site-packages/tensorflow/python/training/input.py:199: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. W0630 08:35:53.808638 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/reader.py:32: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W0630 08:35:53.808758 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/reader.py:35: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W0630 08:35:53.810475 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/reader.py:52: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

W0630 08:35:53.814553 140031590201088 deprecation.py:323] From /home/pf/CycleGAN-TensorFlow/reader.py:45: shuffle_batch (from tensorflow.python.training.input) is deprecated and will be removed in a future version. Instructions for updating: Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.shuffle(min_after_dequeue).batch(batch_size). W0630 08:35:53.820895 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/reader.py:48: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

W0630 08:35:53.839401 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/generator.py:21: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0630 08:35:53.839734 140031590201088 deprecation.py:506] From /home/pf/CycleGAN-TensorFlow/ops.py:188: calling RandomNormal.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0630 08:35:55.420766 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/model.py:87: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

W0630 08:35:55.840024 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/model.py:92: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

W0630 08:35:56.864042 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/model.py:119: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.

W0630 08:35:56.867225 140031590201088 deprecation.py:323] From /home/pf/.local/lib/python3.5/site-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. W0630 08:35:56.869565 140031590201088 deprecation.py:323] From /home/pf/CycleGAN-TensorFlow/model.py:122: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where W0630 08:35:56.870930 140031590201088 deprecation_wrapper.py:119] From /home/pf/CycleGAN-TensorFlow/model.py:129: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0630 08:36:08.143354 140031590201088 deprecation_wrapper.py:119] From train.py:67: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

W0630 08:36:08.144193 140031590201088 deprecation_wrapper.py:119] From train.py:68: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

W0630 08:36:10.731404 140031590201088 deprecation_wrapper.py:119] From train.py:69: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

2019-06-30 08:36:10.979949: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-06-30 08:36:11.179523: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-30 08:36:11.179791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733 pciBusID: 0000:01:00.0 2019-06-30 08:36:11.179861: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64: 2019-06-30 08:36:11.179906: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64: 2019-06-30 08:36:11.179954: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64: 2019-06-30 08:36:11.179997: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64: 2019-06-30 08:36:11.180039: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64: 2019-06-30 08:36:11.180081: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64: 2019-06-30 08:36:11.180122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64: 2019-06-30 08:36:11.180131: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-06-30 08:36:11.180360: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-30 08:36:11.221799: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-30 08:36:11.222076: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12626b60 executing computations on platform CUDA. Devices: 2019-06-30 08:36:11.222090: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1 2019-06-30 08:36:11.243575: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz 2019-06-30 08:36:11.244643: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1239d550 executing computations on platform Host. Devices: 2019-06-30 08:36:11.244657: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-06-30 08:36:11.244700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-30 08:36:11.244709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]
2019-06-30 08:36:12.565100: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. W0630 08:36:12.775092 140031590201088 deprecation.py:323] From train.py:83: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.


could I change cudnn or tensorflow or python version to run in gpu?

rkfg commented 5 years ago

You need to update CUDA to version 10. Don't forget to change $LD_LIBRARY_PATH, I suggest using a symlink /usr/local/cuda pointing to the latest version so it's easier to update (the CUDA installer does this for you usually), my $LD_LIBRARY_PATH is /usr/local/cuda/lib64.