sydsim / personlab-tf

implementation of PersonLab(https://arxiv.org/abs/1803.08225) using TF-slim
103 stars 19 forks source link

train error in 8k step #9

Closed junedgar closed 5 years ago

junedgar commented 5 years ago

@sydsim thank you for your great work. I tried to train the model, but get the error in 8k step. Batch_size = 8

Traceback (most recent call last):
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
    return fn(*args)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8,38,51,51] vs. [3,38,51,51]
     [[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](add_9-0-TransposeNHWCToNCHW-LayoutOptimizer, Conv_4/BiasAdd)]]
     [[Node: train_op/control_dependency/_2693 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7977_train_op/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 26, in <module>
    train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
  File "/home/startdt/zhengjunjun/project/poseEstimation/personlab-tf/personlab/model.py", line 79, in train
    session_config=sess_config,
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 770, in train
    sess, train_op, global_step, train_step_kwargs)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 487, in train_step
    run_metadata=run_metadata)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run
    run_metadata_ptr)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
    run_metadata)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8,38,51,51] vs. [3,38,51,51]
     [[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](add_9-0-TransposeNHWCToNCHW-LayoutOptimizer, Conv_4/BiasAdd)]]
     [[Node: train_op/control_dependency/_2693 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7977_train_op/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'add_1', defined at:
  File "train.py", line 26, in <module>
    train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
  File "/home/startdt/zhengjunjun/project/poseEstimation/personlab-tf/personlab/model.py", line 25, in train
    output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
  File "/home/startdt/zhengjunjun/project/poseEstimation/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
    res = model_base(model_output, inner_h, inner_w)
  File "/home/startdt/zhengjunjun/project/poseEstimation/personlab-tf/personlab/models/model_base.py", line 28, in model_base
    mo_p = [b, y + mo_y, x + mo_x, i]
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 878, in r_binary_op_wrapper
    return func(x, y, name=name)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 297, in add
    "Add", x=x, y=y, name=name)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/home/startdt/zhengjunjun/zhengpy3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [8,38,51,51] vs. [3,38,51,51]
     [[Node: add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](add_9-0-TransposeNHWCToNCHW-LayoutOptimizer, Conv_4/BiasAdd)]]
     [[Node: train_op/control_dependency/_2693 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7977_train_op/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

can you give me some advices?

junedgar commented 5 years ago

change the model.py#L14 to d = d.batch(config.BATCH_SIZE, drop_remainder=True)