jpsml / 6-DOF-Inertial-Odometry

IMU-Based 6-DOF Odometry
BSD 3-Clause "New" or "Revised" License
81 stars 33 forks source link

Error while training #8

Open mahmoodul opened 2 years ago

mahmoodul commented 2 years ago

Hi, When I am running the command "python3 train.py euroc euroc_model_trained_by_me.hdf5" I am getting the following error

Traceback (most recent call last): File "train.py", line 137, in main() File "train.py", line 114, in main train_model = create_train_model_6d_quat(pred_model, window_size) File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 115, in create_train_model_6d_quat out = CustomMultiLossLayer(nb_outputs=2)([y1_true, y2_true, y1_pred, y2_pred]) File "/home/aisl/.local/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/aisl/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 699, in wrapper raise e.ag_error_metadata.to_exception(e) TypeError: Exception encountered when calling layer "custom_multi_loss_layer" (type CustomMultiLossLayer).

in user code:

File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 74, in call  *
    loss = self.multi_loss(ys_true, ys_pred)
File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 66, in multi_loss  *
    loss += precision * quaternion_mean_multiplicative_error(ys_true[1], ys_pred[1]) + self.log_vars[1][0]
File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 36, in quaternion_mean_multiplicative_error  *
    return tf.reduce_mean(quat_mult_error(y_true, y_pred))
File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 29, in quat_mult_error  *
    q = tfq.Quaternion(y_pred).normalized()
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 30, in scoped_func  *
    return func(*args, **kwargs)
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 341, in normalized  *
    return Quaternion(tf.divide(self._q, self.abs()))
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 30, in scoped_func  *
    return func(*args, **kwargs)
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 394, in abs  *
    return tf.sqrt(self.norm(keepdims))
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 30, in scoped_func  *
    return func(*args, **kwargs)
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 389, in norm  *
    return tf.reduce_sum(tf.square(self._q), axis=-1, keep_dims=keepdims)

TypeError: Got an unexpected keyword argument 'keep_dims'

Call arguments received: • inputs=['tf.Tensor(shape=(None, 3), dtype=float32)', 'tf.Tensor(shape=(None, 4), dtype=float32)', 'tf.Tensor(shape=(None, 3), dtype=float32)', 'tf.Tensor(shape=(None, 4), dtype=float32)']

mahmoodul commented 2 years ago

Hi, When I am running the command "python3 train.py euroc euroc_model_trained_by_me.hdf5" I am getting the following error

Model: "model"


Layer (type) Output Shape Param # Connected to

x1 (InputLayer) [(None, 200, 3)] 0 []

x2 (InputLayer) [(None, 200, 3)] 0 []

conv1d (Conv1D) (None, 190, 128) 4352 ['x1[0][0]']

conv1d_2 (Conv1D) (None, 190, 128) 4352 ['x2[0][0]']

conv1d_1 (Conv1D) (None, 180, 128) 180352 ['conv1d[0][0]']

conv1d_3 (Conv1D) (None, 180, 128) 180352 ['conv1d_2[0][0]']

max_pooling1d (MaxPooling1D) (None, 60, 128) 0 ['conv1d_1[0][0]']

max_pooling1d_1 (MaxPooling1D) (None, 60, 128) 0 ['conv1d_3[0][0]']

concatenate (Concatenate) (None, 60, 256) 0 ['max_pooling1d[0][0]',
'max_pooling1d_1[0][0]']

bidirectional (Bidirectional) (None, 60, 256) 395264 ['concatenate[0][0]']

dropout (Dropout) (None, 60, 256) 0 ['bidirectional[0][0]']

bidirectional_1 (Bidirectional (None, 256) 395264 ['dropout[0][0]']
)

dropout_1 (Dropout) (None, 256) 0 ['bidirectional_1[0][0]']

dense (Dense) (None, 3) 771 ['dropout_1[0][0]']

dense_1 (Dense) (None, 4) 1028 ['dropout_1[0][0]']

================================================================================================== Total params: 1,161,735 Trainable params: 1,161,735 Non-trainable params: 0


Model: "model_1"


Layer (type) Output Shape Param # Connected to

x1 (InputLayer) [(None, 200, 3)] 0 []

x2 (InputLayer) [(None, 200, 3)] 0 []

y1_true (InputLayer) [(None, 3)] 0 []

y2_true (InputLayer) [(None, 4)] 0 []

model (Functional) [(None, 3), 1161735 ['x1[0][0]',
(None, 4)] 'x2[0][0]']

custom_multi_loss_layer (Custo (None, 14) 2 ['y1_true[0][0]',
mMultiLossLayer) 'y2_true[0][0]',
'model[0][0]',
'model[0][1]']

================================================================================================== Total params: 1,161,737 Trainable params: 1,161,737 Non-trainable params: 0


WARNING:tensorflow:Model failed to serialize as JSON. Ignoring... Layer CustomMultiLossLayer has arguments ['self', 'nb_outputs'] in __init__ and therefore must override get_config().

Example:

class CustomLayer(keras.layers.Layer): def init(self, arg1, arg2): super().init() self.arg1 = arg1 self.arg2 = arg2

def get_config(self):
    config = super().get_config()
    config.update({
        "arg1": self.arg1,
        "arg2": self.arg2,
    })
    return config

Epoch 1/500 2021-12-19 23:46:32.084681: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8301 411/411 [==============================] - ETA: 0s - loss: 0.0422
Epoch 00001: val_loss improved from inf to -0.05318, saving model to model_checkpoint.hdf5 Traceback (most recent call last): File "train.py", line 137, in main() File "train.py", line 120, in main history = train_model.fit([x_gyro, x_acc, y_delta_p, y_delta_q], epochs=500, batch_size=32, verbose=1, callbacks=[model_checkpoint, tensorboard], validation_split=0.1) File "/home/aisl/.local/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/aisl/.local/lib/python3.8/site-packages/keras/engine/base_layer.py", line 746, in get_config raise NotImplementedError(textwrap.dedent(f""" NotImplementedError: Layer CustomMultiLossLayer has arguments ['self', 'nb_outputs'] in __init__ and therefore must override get_config().

Example:

class CustomLayer(keras.layers.Layer): def init(self, arg1, arg2): super().init() self.arg1 = arg1 self.arg2 = arg2

def get_config(self):
    config = super().get_config()
    config.update({
        "arg1": self.arg1,
        "arg2": self.arg2,
    })
    return config
mahmoodul commented 2 years ago

Error looks like python3 train.py euroc euroc_model_trained_by_me.hdf5 2021-12-20 19:00:54.607921: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.633887: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.634068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.634376: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-12-20 19:00:54.634934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.635106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.635255: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.982324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.982545: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.982724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-12-20 19:00:54.982876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9832 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1 Model: "model"


Layer (type) Output Shape Param # Connected to

x1 (InputLayer) [(None, 200, 3)] 0 []

x2 (InputLayer) [(None, 200, 3)] 0 []

conv1d (Conv1D) (None, 190, 128) 4352 ['x1[0][0]']

conv1d_2 (Conv1D) (None, 190, 128) 4352 ['x2[0][0]']

conv1d_1 (Conv1D) (None, 180, 128) 180352 ['conv1d[0][0]']

conv1d_3 (Conv1D) (None, 180, 128) 180352 ['conv1d_2[0][0]']

max_pooling1d (MaxPooling1D) (None, 60, 128) 0 ['conv1d_1[0][0]']

max_pooling1d_1 (MaxPooling1D) (None, 60, 128) 0 ['conv1d_3[0][0]']

concatenate (Concatenate) (None, 60, 256) 0 ['max_pooling1d[0][0]',
'max_pooling1d_1[0][0]']

bidirectional (Bidirectional) (None, 60, 256) 394240 ['concatenate[0][0]']

dropout (Dropout) (None, 60, 256) 0 ['bidirectional[0][0]']

bidirectional_1 (Bidirectional (None, 256) 394240 ['dropout[0][0]']
)

dropout_1 (Dropout) (None, 256) 0 ['bidirectional_1[0][0]']

dense (Dense) (None, 3) 771 ['dropout_1[0][0]']

dense_1 (Dense) (None, 4) 1028 ['dropout_1[0][0]']

================================================================================================== Total params: 1,159,687 Trainable params: 1,159,687 Non-trainable params: 0


Traceback (most recent call last): File "train.py", line 149, in main() File "train.py", line 125, in main train_model = create_train_model_6d_quat(pred_model, window_size) File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 123, in create_train_model_6d_quat out = CustomMultiLossLayer(nb_outputs=2)([y1_true, y2_true, y1_pred, y2_pred]) File "/home/aisl/.local/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/aisl/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 699, in wrapper raise e.ag_error_metadata.to_exception(e) TypeError: Exception encountered when calling layer "custom_multi_loss_layer" (type CustomMultiLossLayer).

in user code:

File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 82, in call  *
    loss = self.multi_loss(ys_true, ys_pred)
File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 74, in multi_loss  *
    loss += precision * quaternion_mean_multiplicative_error(ys_true[1], ys_pred[1]) + self.log_vars[1][0]
File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 33, in quaternion_mean_multiplicative_error  *
    return tf.reduce_mean(quat_mult_error(y_true, y_pred))
File "/mah/AI/6-DOF-Inertial-Odometry-master/model.py", line 26, in quat_mult_error  *
    q = tfq.Quaternion(y_pred).normalized()
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 30, in scoped_func  *
    return func(*args, **kwargs)
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 341, in normalized  *
    return Quaternion(tf.divide(self._q, self.abs()))
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 30, in scoped_func  *
    return func(*args, **kwargs)
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 394, in abs  *
    return tf.sqrt(self.norm(keepdims))
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 30, in scoped_func  *
    return func(*args, **kwargs)
File "/home/aisl/.local/lib/python3.8/site-packages/tfquaternion/tfquaternion.py", line 389, in norm  *
    return tf.reduce_sum(tf.square(self._q), axis=-1, keep_dims=keepdims)

TypeError: Got an unexpected keyword argument 'keep_dims'

Call arguments received: • inputs=['tf.Tensor(shape=(None, 3), dtype=float32)', 'tf.Tensor(shape=(None, 4), dtype=float32)', 'tf.Tensor(shape=(None, 3), dtype=float32)', 'tf.Tensor(shape=(None, 4), dtype=float32)']

jpsml commented 2 years ago

Please try to use the following versions of the prerequisites: Keras (https://keras.io/) 2.2.4 together with TensorFlow (https://www.tensorflow.org/) 1.13.1

mahmoodul commented 2 years ago

Dear João Paulo Silva of Monte Lima, Thanks so much, the problem is that TensorFlow 1.13 is not supported anymore. I have a few questions about your paper and source code. It will be great if you guide me about it. I will be thankful to you.

1) First thing is that in code due to modification of Keras and TensorFlow library, your code is giving error so I modified the code by replacing all from keras.------ import ----- to from tensorflow.keras.models import Sequential after that it is working.

2) In the function def multi_loss(self, ys_true, ys_pred): in file model.py there are two loss function are given as precision = K.exp(-self.log_vars[1][0])

loss += precision * quaternion_mean_multiplicative_error(ys_true[1], ys_pred[1]) + self.log_vars[1][0]

loss += precision * quaternion_phi_4_error(ys_true[1], ys_pred[1]) + self.log_vars[1][0] when I am using quaternion_mean_multiplicative_error it's givng error but training is working when I am using quaternion_phi_4_error. but loss graph is different form given in research paper. Would you please guide me on why it's happening?

3) The model loss plot for two different epochs 500 and 1000 is attached we can it seems wired. would you please guide me on why it's happening?

4)I could not understand the "Multi-Task Learning for Metric Balancing" especially figure 3. what is the output of multiloss layer in figure 3. when the change in pose is already estimated by neural network architecture in figure 01 what is the need for another layer and what is final the output of that layer? result result2