Wuxinxiaoshifu commented 3 years ago

Prereqisites

Please answer the following questions for yourself before submitting an issue.

[ ] I am using the latest TensorFlow Model Garden release and TensorFlow 1.15.
[ ] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[ ] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/deeplab

2. Describe the bug

when i run python3 train.py, the bug is "ensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0" my libcudart.so is libcudart.so.11.0

3. Steps to reproduce

Steps to reproduce the behavior.

4. Expected behavior

i want to train my datasets, can you fix this problem?

5. Additional context

I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-10-27 22:44:43.970758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-10-27 22:44:43.971327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1650 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.485 pciBusID: 0000:01:00.0 2021-10-27 22:44:43.971467: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/wuxin/YZH/object_detect/yolo_ws/devel/lib:/home/wuxin/JKW/local_road_ws/devel/lib:/home/wuxin/YZH/lidar_ws/devel/lib:/home/wuxin/YZH/ccm_slam_ws/devel/lib:/home/wuxin/YZH/yzh_ws/devel/lib:/home/wuxin/YZH/yzh_ws2NoCat/cartographer_ws/devel_isolated/cartographer_rviz/lib:/home/wuxin/YZH/yzh_ws2NoCat/cartographer_ws/install_isolated/lib:/home/wuxin/zed/zed_ros_ws/devel/lib:/opt/ros/melodic/lib:/usr/local/opencv/lib:/usr/local/cuda-11.0/lib64:/opt/ros/noetic/lib/x86_64-linux-gnu:/home/wuxin/px/PX4-Autopilot/build/px4_sitl_default/build_gazebo

6. System information

Linux Ubuntu 18.04
TensorFlow installed from: pip3 install tensorflow-gpu
TensorFlow version:1.15
Python version:3.6.9
version :4.2.1
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: cuda11.0 cudnn 8.0.5
GPU model and memory:GTX1650Ti 3903MB

Wuxinxiaoshifu commented 3 years ago

python3 /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/train.py --logtostderr --training_number_of_steps=10000 --train_split="train" --model_variant="xception_65" --atrous_rates=6 --atrous_rates=12 --atrous_rates=18 --output_stride=16 --decoder_output_stride=6 --train_crop_size=481,641 --train_batch_size=2 --dataset="pascal_voc_seg" --num_clones=1 --tf_initial_checkpoint='/home/wuxin/YZH/object_detect/deeplab/datasets/deeplabv3_pascal_trainval' --train_logdir='/home/wuxin/YZH/object_detect/deeplab/datasets/model_get' --dataset_dir='/home/wuxin/YZH/object_detect/deeplab/datasets/dataset/tfrecords' WARNING:tensorflow:From /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/core/conv2d_ws.py:40: The name tf.layers.Layer is deprecated. Please use tf.compat.v1.layers.Layer instead.

WARNING:tensorflow: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue.

INFO:tensorflow:Training on train set I1028 13:33:41.810364 139670364448576 train.py:290] Training on train set WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W1028 13:33:41.953724 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W1028 13:33:41.954856 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

2021-10-28 13:33:42.336265: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-10-28 13:33:42.376173: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-10-28 13:33:42.376664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1650 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.485 pciBusID: 0000:01:00.0 2021-10-28 13:33:42.376876: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-10-28 13:33:42.378843: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2021-10-28 13:33:42.379859: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2021-10-28 13:33:42.380096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2021-10-28 13:33:42.382565: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2021-10-28 13:33:42.383152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2021-10-28 13:33:42.383301: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-10-28 13:33:42.383390: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-10-28 13:33:42.383792: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-10-28 13:33:42.384121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

W1028 13:33:42.746240 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W1028 13:33:43.337642 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.lin_space is deprecated. Please use tf.linspace instead.

W1028 13:33:43.338138 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.lin_space is deprecated. Please use tf.linspace instead.

WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.

W1028 13:33:43.338405 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.

WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.

W1028 13:33:43.539838 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.

WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.reverse_v2 is deprecated. Please use tf.reverse instead.

W1028 13:33:45.765414 139670364448576 module_wrapper.py:139] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.reverse_v2 is deprecated. Please use tf.reverse instead.

WARNING:tensorflow:From /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/datasets/data_generator.py:339: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use for ... in dataset: to iterate over a dataset. If using tf.estimator, return the Dataset object directly from your input function. As a last resort, you can use tf.compat.v1.data.make_one_shot_iterator(dataset). W1028 13:33:45.997265 139670364448576 deprecation.py:323] From /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/datasets/data_generator.py:339: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use for ... in dataset: to iterate over a dataset. If using tf.estimator, return the Dataset object directly from your input function. As a last resort, you can use tf.compat.v1.data.make_one_shot_iterator(dataset). WARNING:tensorflow:From /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/core/feature_extractor.py:490: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1028 13:33:46.019285 139670364448576 deprecation.py:323] From /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/core/feature_extractor.py:490: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use layer.__call__ method instead. W1028 13:33:46.024497 139670364448576 deprecation.py:323] From /home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use layer.__call__ method instead. WARNING:tensorflow:From /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/core/xception.py:393: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1028 13:33:46.099911 139670364448576 module_wrapper.py:139] From /home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/core/xception.py:393: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

Traceback (most recent call last): File "/home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/train.py", line 464, in tf.compat.v1.app.run() File "/home/wuxin/.local/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/wuxin/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/wuxin/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/train.py", line 321, in main clones = model_deploy.create_clones(config, model_fn, args=model_args) File "/home/wuxin/YZH/object_detect/deeplab/models/research/slim/deployment/model_deploy.py", line 192, in create_clones outputs = model_fn(*args, **kwargs) File "/home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/train.py", line 252, in _build_deeplab 'total_training_steps': FLAGS.training_number_of_steps, File "/home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/model.py", line 323, in multi_scale_logits nas_training_hyper_parameters=nas_training_hyper_parameters) File "/home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/model.py", line 597, in _get_logits use_bounded_activation=model_options.use_bounded_activation) File "/home/wuxin/YZH/object_detect/deeplab/models/research/deeplab/model.py", line 709, in refine_by_decoder feature_extractor.DECODER_END_POINTS][output_stride] KeyError: 6

TJxiaominliu commented 2 years ago

have you solve it? i have the same error.

tensorflow / models

my cuda is 11.0 but it need libcudart.so.10.0 #10335