tryolabs / luminoth

Deep Learning toolkit for Computer Vision.
https://tryolabs.com
BSD 3-Clause "New" or "Revised" License
2.4k stars 400 forks source link

InvalidArgumentError when saving checkpoints #221

Open panypopo opened 5 years ago

panypopo commented 5 years ago

train.yml:

train:
  run_name: fashion-train-2
  job_dir: C:\workspaces\geco\faster-rcnn-object-detect\coco\jobs-gpu\

dataset:
  type: object_detection
  dir: C:\workspaces\geco\faster-rcnn-object-detect\coco\tf

model:
  type: ssd
  network:
    num_classes: 3

after 1000 steps:

INFO:tensorflow:finished training after 10000 epoch limit
INFO:tensorflow:Saving checkpoints for 3528 into C:\workspaces\geco\faster-rcnn-object-detect\coco\jobs-gpu\fashion-train-2\model.ckpt.
Traceback (most recent call last):
  File "C:\Users\panyp\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\panyp\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\workspaces\geco\faster-rcnn-object-detect\venv\Scripts\lumi.exe\__main__.py", line 9, in <module>
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\luminoth\train.py", line 307, in train
    config, environment=environment
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\luminoth\train.py", line 268, in run
    return step
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\training\monitored_session.py", line 783, in __exit__
    self._close_internal(exception_type)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\training\monitored_session.py", line 821, in _close_internal
    self._sess.close()
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1069, in close
    self._sess.close()
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1213, in close
    ignore_live_threads=True)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 257, in _run
    enqueue_callable()
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\client\session.py", line 1215, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "c:\workspaces\geco\faster-rcnn-object-detect\venv\lib\site-packages\tensorflow\python\client\session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 350142000 values, but the requested shape has 2613000
         [[{{node object_detection_dataset/Reshape}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _class=["loc:@object_detection_dataset/cond/Switch_2"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](object_detection_dataset/Cast, object_detection_dataset/stack)]]
         [[{{node object_detection_dataset/crop_to_bounding_box/Greater_1/_583}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_355_object_detection_dataset/crop_to_bounding_box/Greater_1", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]