Closed Phillylyu closed 6 years ago
hi @Phillylyu, can you share the config file you are using?
@vierja .Thanks your response. Maybe it is the config mistake. I limit the class to 2, meanwhile ,some pic does not have annations. I enlarged the class, now it is ok. thank u.
Hi, I am facing the same issue. I checked the number of classes inside sample_config.yml, here is the sample_config.yml below:
train:
# Directory in which model checkpoints & summaries (for Tensorboard) will be saved
job_dir: jobs/
debug: True
dataset:
type: object_detection
# From which directory to read the dataset
dir: dataset/TFRecords
model:
type: fasterrcnn
network:
# Total number of classes to predict
num_classes: 9
base_network:
# Which type of pretrained network to use
architecture: resnet_v1_101
# Should we train the pretrained network
trainable: True
# Should we download weights if not available
download: True
But even then I am always facing the same issue as below:
INFO:tensorflow:step: 36, file: b'205.jpg', train_loss: 99.34402465820312, in 0.34s
INFO:tensorflow:step: 37, file: b'3150.jpg', train_loss: 195.69007873535156, in 0.35s
INFO:tensorflow:step: 38, file: b'309.jpg', train_loss: 122.05986022949219, in 0.36s
INFO:tensorflow:step: 39, file: b'230.jpg', train_loss: 215.76116943359375, in 0.35s
Traceback (most recent call last):
File "/home/arun/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/home/arun/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/home/arun/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15,4] vs. [16,4]
Please help me, where else I need to change.
Kind Regards Arun
I think, u should check the pic and annatiation to make sure they are matched.
在 2018年2月15日,上午4:45,Arun Kumar notifications@github.com 写道:
Hi, I am facing the same issue. I checked the number of classes inside sample_config.yml, here is teh sample_config.yml below:
train:
Directory in which model checkpoints & summaries (for Tensorboard) will be saved
job_dir: jobs/ debug: True
dataset: type: object_detection
From which directory to read the dataset
dir: dataset/TFRecords
model: type: fasterrcnn network:
Total number of classes to predict
num_classes: 9
base_network:
Which type of pretrained network to use
architecture: resnet_v1_101 # Should we train the pretrained network trainable: True # Should we download weights if not available download: True
But even then I am always facing the same issue as below:
INFO:tensorflow:step: 36, file: b'205.jpg', train_loss: 99.34402465820312, in 0.34s INFO:tensorflow:step: 37, file: b'3150.jpg', train_loss: 195.69007873535156, in 0.35s INFO:tensorflow:step: 38, file: b'309.jpg', train_loss: 122.05986022949219, in 0.36s INFO:tensorflow:step: 39, file: b'230.jpg', train_loss: 215.76116943359375, in 0.35s Traceback (most recent call last): File "/home/arun/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/home/arun/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/home/arun/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15,4] vs. [16,4]
Please help me, where else I need to change.
Kind Regards Arun
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tryolabs/luminoth/issues/151#issuecomment-365739434, or mute the thread https://github.com/notifications/unsubscribe-auth/AcO23MuMEHyyrJl6VgzXMdRn6E5_NF6gks5tU0XbgaJpZM4RzY6-.
Thank you Phillylyu, The problem is solved sometimes back. Yes, you are right, some of the Annotation files had the problem.
I am using luminoth, using imagenet to train the module. The config is: fastrrcnn, arc:resnet_v01_101 When I try to train the data, I always suffer the following info, and lumi train quit.
W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [0,4] vs. [5,4] [[Node: losses/RCNNLoss/sub_1 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](losses/RCNNLoss/bbox_offset_cleaned/Gather, losses/RCNNLoss/bbox_offsets_target_labeled/Gather)]] Traceback (most recent call last): File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [0,4] vs. [5,4] [[Node: losses/RCNNLoss/sub_1 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](losses/RCNNLoss/bbox_offset_cleaned/Gather, losses/RCNNLoss/bbox_offsets_target_labeled/Gather)]] [[Node: Momentum/update/_13924 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14102_Momentum/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/root/anaconda3/envs/tflm/bin/lumi", line 11, in
load_entry_point('luminoth', 'console_scripts', 'lumi')()
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 722, in call
return self.main(args, kwargs)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(args, kwargs)
File "/root/luminoth/luminoth/train.py", line 249, in train
config, environment=environment
File "/root/luminoth/luminoth/train.py", line 181, in run
], options=run_options)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 521, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 892, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 967, in run
raise six.reraise(original_exc_info)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
return self._sess.run(args, kwargs)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1024, in run
run_metadata=run_metadata)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 827, in run
return self._sess.run(*args, **kwargs)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [0,4] vs. [5,4]
[[Node: losses/RCNNLoss/sub_1 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](losses/RCNNLoss/bbox_offset_cleaned/Gather, losses/RCNNLoss/bbox_offsets_target_labeled/Gather)]]
[[Node: Momentum/update/_13924 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14102_Momentum/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'losses/RCNNLoss/sub_1', defined at: File "/root/anaconda3/envs/tflm/bin/lumi", line 11, in
load_entry_point('luminoth', 'console_scripts', 'lumi')()
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 722, in call
return self.main(args, kwargs)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(args, **kwargs)
File "/root/luminoth/luminoth/train.py", line 249, in train
config, environment=environment
File "/root/luminoth/luminoth/train.py", line 66, in run
total_loss = model.loss(prediction_dict)
File "/root/luminoth/luminoth/models/fasterrcnn/fasterrcnn.py", line 188, in loss
prediction_dict['classification_prediction']
File "/root/luminoth/luminoth/models/fasterrcnn/rcnn.py", line 370, in loss
sigma=self._l1_sigma
File "/root/luminoth/luminoth/utils/losses.py", line 22, in smooth_l1_loss
diff = bbox_prediction - bbox_target
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper
return func(x, y, name=name)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4636, in _sub
"Sub", x=x, y=y, name=name)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/root/anaconda3/envs/tflm/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Incompatible shapes: [0,4] vs. [5,4] [[Node: losses/RCNNLoss/sub_1 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](losses/RCNNLoss/bbox_offset_cleaned/Gather, losses/RCNNLoss/bbox_offsets_target_labeled/Gather)]] [[Node: Momentum/update/_13924 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14102_Momentum/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]