Can't to run inference - Githubissues

EgorAntonovich commented 5 years ago

Hi, i want to run inference. I use model xception_65 on ade20k from http://download.tensorflow.org/models/deeplabv3_xception_ade20k_train_2018_05_29.tar.gz GPU: 2 NVIDIA GeForce GTX 1060 3GB CUDA: 9.0 cudNN: 7.3 tensorflow-gpu v.(1.11.0) and when i run inference on my dataset a have this traceback INFO:tensorflow:Using default config. INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe236379290>, '_model_dir': '/home/egor.antonovich/model_deeplab/xception_65/', '_protocol': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''} WARNING:tensorflow:ParseError: 2:1 : Message type "tensorflow.CheckpointState" has no field named "all_model_checkpoint_path". WARNING:tensorflow:/home/egor.antonovich/model_deeplab/xception_65/checkpoint: Checkpoint ignored INFO:tensorflow:Could not find trained model in model_dir: /home/egor.antonovich/model_deeplab/xception_65/, running initialization to predict. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. 2018-10-31 21:23:08.711251: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-10-31 21:23:09.006719: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-10-31 21:23:09.007162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:01:00.0 totalMemory: 2.94GiB freeMemory: 2.88GiB 2018-10-31 21:23:09.076701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-10-31 21:23:09.077106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 1 with properties: name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:02:00.0 totalMemory: 2.94GiB freeMemory: 2.88GiB 2018-10-31 21:23:09.077143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0, 1 2018-10-31 21:23:09.431331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-31 21:23:09.431359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 1 2018-10-31 21:23:09.431364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N N 2018-10-31 21:23:09.431367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1: N N 2018-10-31 21:23:09.431594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2580 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 3GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-10-31 21:23:09.440966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 2580 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1060 3GB, pci bus id: 0000:02:00.0, compute capability: 6.1) INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. Traceback (most recent call last): File "inference.py", line 100, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "inference.py", line 84, in main for pred_dict, image_path in zip(predictions, image_files): File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 569, in predict preds_evaluated = mon_sess.run(predictions) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run run_metadata=run_metadata) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1148, in run run_metadata=run_metadata) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1239, in run raise six.reraise(*original_exc_info) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1224, in run return self._sess.run(*args, **kwargs) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1296, in run run_metadata=run_metadata) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1076, in run return self._sess.run(*args, **kwargs) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 887, in run run_metadata_ptr) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1110, in _run feed_dict_tensor, options, run_metadata) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run run_metadata) File "/home/egor.antonovich/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 2 (size = 4) and num_split 3 [[{{node split}} = Split[T=DT_FLOAT, num_split=3, _device="/device:CPU:0"](split/split_dim, ToFloat)]] [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[?,?,?,3]], output_types=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]] [[{{node IteratorGetNext/_2361}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_589_IteratorGetNext", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] Maybe anyone know how to fix it.

HanChen-HUST commented 5 years ago

Hello，can you tell how do u solve this error?with much thanks @EgorAntonovich

EgorAntonovich commented 5 years ago

Hi @chenhust1995, i used this solution for my problem.

HanChen-HUST commented 5 years ago

tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 2(size=1) and num_split 3 [[{{node split}} = Split[T=DT_FLOAT, num_split=3, _device="/device:CPU:0"](split/split_dim, ToFloat)]] [{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,?,?,3],[?,?,?,1]], output_types=[DT_FLOAT，DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

my error has shown to u,is it similar with ur error?will it can be solved by your solution? it happend in evaluate.py@EgorAntonovich

EgorAntonovich commented 5 years ago

In my task i needed only visualization and my solution works without mistakes. But you can try to solve your problem by using frozen graph.

HanChen-HUST commented 5 years ago

thanks for your suggestion,that's very kind of you

toobaimt commented 5 years ago

@chenhust1995 @EgorAntonovich I was running into the same, but then realized the problem was with my images; I was using grayscale images whereas the model is trained on 3-channel images. Hope this helps clear out the confusion.

HanChen-HUST commented 5 years ago

@toobaimt thanks,did u work it out on one-channel images？

toobaimt commented 5 years ago

@chenhust1995 not really; I converted my one channel images to 3-channel by a simple cat and it worked!

HanChen-HUST commented 5 years ago

@toobaimt all right,it could work in Pascal dataset,but it could't work out in COCO dataset

rishizek / tensorflow-deeplab-v3-plus

Can't to run inference #33