stesha2016 / lanenet-enet-hnet

Apache License 2.0
88 stars 33 forks source link

hnet_train InvalidArgumentError: Input is not invertible error #13

Closed enginbaglayici closed 2 years ago

enginbaglayici commented 4 years ago

Hello,

I am trying to train H-net but I am encountering the following errors.

`Traceback (most recent call last):
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input is not invertible.
         [[{{node hnet/hnet_loss/while/MatrixInverse}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\hnet_train.py", line 87, in <module>
    _, loss, coefficient = sess.run([optimizer, c_loss, coef], feed_dict={tensor_in: image, gt_label_pts: label_pts})
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input is not invertible.
         [[node hnet/hnet_loss/while/MatrixInverse (defined at C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]

Original stack trace for 'hnet/hnet_loss/while/MatrixInverse':
  File ".\hnet_train.py", line 24, in <module>
    c_loss, coef, pre_loss = net.compute_loss(tensor_in, gt_label_pts=gt_label_pts, name='hnet')
  File "C:\Users\u22n57\Desktop\Multi-lane detection\lanenet-enet-hnet\lanenet_model\hnet_model.py", line 85, in compute_loss
    name='hnet_loss')
  File "C:\Users\u22n57\Desktop\Multi-lane detection\lanenet-enet-hnet\lanenet_model\hnet_loss.py", line 63, in hnet_loss
    _, _, _, losses = tf.while_loop(cond, body, [0, gt_pts, transformation_coeffcient, output_ta_loss])
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\ops\control_flow_ops.py", line 2753, in while_loop
    return_same_structure)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\ops\control_flow_ops.py", line 2245, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\ops\control_flow_ops.py", line 2170, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "C:\Users\u22n57\Desktop\Multi-lane detection\lanenet-enet-hnet\lanenet_model\hnet_loss.py", line 51, in body
    w = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(tf.transpose(Y_stack), Y_stack)),
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\ops\gen_linalg_ops.py", line 1492, in matrix_inverse
    "MatrixInverse", input=input, adjoint=adjoint, name=name)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "C:\Users\u22n57\AppData\Local\Continuum\anaconda3\envs\lane_detect\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()`

I've managed to pretrain the network for 20k epochs, but I couldn't complete the main training phase because of this error. The error may happen in any epoch, I could train the network at most for 12k epochs after a couple of tries. What could be the reason that I am having this error?

And also in this case, I've kept training from the checkpoint (12000th epoch) and trained the network for 8k epoch more, would that make any difference from directly training the network for 20k epochs? Thanks..

fishtail-wang commented 2 years ago

Hello, Have you fixed this problem yet? I also met this trouble while my training.

enginbaglayici commented 2 years ago

Hi! I couldn't solve this issue, as I said I instead restarted the training where it is left out.

fishtail-wang commented 2 years ago

Oh that helps a lot. Thanks buddy.