myfavouritekk / TPN

Tubelet Proposal Network
MIT License
70 stars 18 forks source link

Train LSTM and ED-LSTM error ValueError: List argument 'values' tp 'Pack' Op with length 0 shorter than minimum length 1 #5

Open Solomon1588 opened 7 years ago

Solomon1588 commented 7 years ago

When I trained vanilla LSTM and ED-LSTM with Tensorflow, I met runtime error: ValueError: List argument 'values' tp 'Pack' Op with length 0 shorter than minimum length 1 I found that it was caused by empty tensorflow variable 's_tvars' in src/tpn/model.py : If 'cls_init' and 'bbox_init' is None, the list 'self._small_lr_vars' is empty, thus the list 's_tvars' is empty. When calaculating and cliping the gradient of variables in 's_tvars', it runs failed.

In the source code, all the tensorflow varialbles are divided into two sets according to the 'cls_init' and 'bbox_init' files. TF variables in n_tvars set use normal learning rate, variables in s_tvars set use smaller learning rate . However, because there are no 'cls_init' or 'bbox_init' files, I don't know witch tf variables are in the s_tvars list.

To debug this error, I tried to use all of tf variables to calculate and clip gradient such as follow:

grads, global_norm = tf.clilp_by_global_norm(tf.gradients(cost, tvars), config.max_grad_norm)
optimizer = tf.train.MomentumOptimizer(self.lr, self.momentum)
self._train_op = optimizer.apply_gradients(zip(grads, tvars))
self.global_norm = global_norm

After that, the program runs successfully. However, the total loss is abnormally large, especially the bbox cost: cost loss = 330.16 = cls_cost 10.961 + end_cost 1.566 1 + bbox_cost 317.668 Global norm: 347.896 Until I use larger initial learning rate (e.g., 0.01), the model is convergent.

@myfavouritekk I wonder my solution whether right or not?