Open changlinzhang opened 6 years ago
have you figured out how it works? I trained on my own dataset as well, but the accuracy is so low..
heollo, @changlinzhang @kwotsin could you tell me how to use the files in the checkpoint folder as the pretrain model to train my own dataset?
hello,everyone,so how to make our data set to train? Thank you.
have you figured out how it works? I trained on my own dataset as well, but the accuracy is so low..
I made my own dataset, but I met errors below
InvalidArgumentError (see above for traceback): assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [2]
[[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_5481, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_5483, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_5485)]]
Traceback (most recent call last):
File "train_enet.py", line 337, in labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [2]
[[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_5481, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_5483, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_5485)]]
Caused by op u'mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert', defined at:
File "train_enet.py", line 337, in labels
out of bound')],
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 559, in assert_less
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 118, in wrapped
return _add_should_use_warning(fn(*args, kwargs))
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 157, in Assert
guarded_assert = cond(condition, no_op, true_assert, name="AssertGuard")
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, *kwargs)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2057, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1895, in BuildCondBranch
original_result = fn()
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 155, in true_assert
condition, data, summarize, name="Assert")
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 51, in _assert
name=name)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(args, kwargs)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [2]
[[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_5481, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_5483, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_5485)]]
Could you help me please
I faced same problem. In my case, I remade annotation images not including value of '255' and works. https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/107#issuecomment-325857231
@RobinHan24 I met the same problem.I have 10 classes,according my classes,I set the pixels of my label images to 0 to 9,then the problem fixed.I don't wither it is helpful for you?
thanks for this useful repo hi everyone if anyone could help me out to solve this issue
1) the current code works for camvid dataset,
2) am facing a difficulty in training this ENet model with cityscapes dataset :
which i tried using https://github.com/mcordts/cityscapesScripts and got trained data,
now i would like to import this similar data in this code but states dimension miss match, could you please help me to fix this grey scale image insertion as i have 4types(color.png,instance.png,labeld.png,json.png,trainid.png) of labeling after training the data. how to choose anyone from this folder and import in this model
i tried for single type of images and got error:
InvalidArgumentError (see above for traceback): assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [12]
[[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py:3561) = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2)]]
as i am beginner to this field so, hoping for suggestions to resolve this error.
Hi, kwotsin! Thanks for your work. I want to train it on another data set (class number is 30 instead of 12). I thought I had changed related codes. But I met this error: 2018-01-11 17:23:22.187077: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000 I thought it may be caused by checkpoint? How can I deal with this problem?
The completed information is as follow: ========= Median Frequency Balancing Class Weights =========
run()
File "train_enet.py", line 337, in run
plt.savefig(photodir+"/image" + str(i))
File "/usr/lib64/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1063, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/lib64/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [all dims of \'image.shape\' must be > 0.]
[[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]]
[6.397542327061094e-05, 6.7097626201794152e-05, 0.024400273767542283, 0.041269401614453756, 5.5506352412896832e-05, 0.076635711324892844, 0.069381256179271614, 3.472654196521944e-05, 0.00042760164428717635, 0.00012440287198120186, 0.090233329139976615, 0.12489918060211183, 0.0013708685331902757, 6.0827765291491662e-05, 0.073240128809290553, 0.35775514055273316, 0.64257341685305103, 0.90968868010977944, 0.37688909228806228, 0.44248634385452756, 0.00042529101230680852, 0.30566376891079095, 0.28941152643298945, 3.9464190165066867e-05, 0.26421036878629223, 0.42250536299160169, 0.5089356784417215, 0.00024742224929701886, 0.47265314480960613, 0.0]
2018-01-11 17:22:23.528595: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:23.528689: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:23.528720: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:29.254935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla K40c major: 3 minor: 5 memoryClockRate (GHz) 0.745 pciBusID 0000:02:00.0 Total memory: 11.17GiB Free memory: 11.10GiB 2018-01-11 17:22:29.503633: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x1e106f80 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that. 2018-01-11 17:22:29.504523: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-01-11 17:22:29.505315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties: name: Tesla K40c major: 3 minor: 5 memoryClockRate (GHz) 0.745 pciBusID 0000:84:00.0 Total memory: 11.17GiB Free memory: 11.10GiB 2018-01-11 17:22:29.505448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1 2018-01-11 17:22:29.505491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0 2018-01-11 17:22:29.505540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 2018-01-11 17:22:29.505685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N 2018-01-11 17:22:29.505705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y 2018-01-11 17:22:29.505740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0) 2018-01-11 17:22:29.505779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:84:00.0) 2018-01-11 17:22:34.391659: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1368 get requests, put_count=1100 evicted_count=1000 eviction_rate=0.909091 and unsatisfied allocation rate=1 2018-01-11 17:22:34.391731: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_sizelimit from 100 to 110 INFO:tensorflow:Starting standard services. INFO:tensorflow:Starting queue runners. INFO:tensorflow:Saving checkpoint to path ./log/original/model.ckpt INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:Epoch 1/300 INFO:tensorflow:Current Learning Rate: [0.00050000002] INFO:tensorflow:global step 1: loss: 0.3121 (4.79 sec/step) Current Streaming Accuracy: 0.0000 Current Mean IOU: 0.0000 INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0000 Validation Mean IOU: 0.0000 (2.24 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0209 Validation Mean IOU: 0.0030 (1.10 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0207 Validation Mean IOU: 0.0028 (1.26 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0227 Validation Mean IOU: 0.0033 (1.23 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0220 Validation Mean IOU: 0.0035 (1.24 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0208 Validation Mean IOU: 0.0033 (1.28 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0201 Validation Mean IOU: 0.0033 (1.22 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0198 Validation Mean IOU: 0.0032 (1.25 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0032 (1.24 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.21 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0031 (1.18 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.21 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.39 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0032 (1.23 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0032 (1.18 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0193 Validation Mean IOU: 0.0032 (1.16 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0191 Validation Mean IOU: 0.0031 (1.41 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0193 Validation Mean IOU: 0.0031 (1.26 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0195 Validation Mean IOU: 0.0032 (1.43 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0032 (1.32 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0202 Validation Mean IOU: 0.0033 (1.34 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0204 Validation Mean IOU: 0.0034 (1.33 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0203 Validation Mean IOU: 0.0034 (1.21 sec/step) INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0206 Validation Mean IOU: 0.0034 (1.36 sec/step) 2018-01-11 17:23:21.808311: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: assertion failed: [all dims of \'image.shape\' must be > 0.] [[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]] INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [all dims of \'image.shape\' must be > 0.] [[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]] INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0207 Validation Mean IOU: 0.0035 (1.19 sec/step) 2018-01-11 17:23:22.187077: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000 [[Node: Reshape_5 = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](batch_1/_5971, Reshape_5/shape)]] 2018-01-11 17:23:22.197319: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000 [[Node: Reshape_5 = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](batch_1/_5971, Reshape_5/shape)]] Traceback (most recent call last): File "train_enet.py", line 340, in