When I trained vanilla LSTM and ED-LSTM with Tensorflow, I met runtime error:
ValueError: List argument 'values' tp 'Pack' Op with length 0 shorter than minimum length 1
I found that it was caused by empty tensorflow variable 's_tvars' in src/tpn/model.py : If 'cls_init' and 'bbox_init' is None, the list 'self._small_lr_vars' is empty, thus the list 's_tvars' is empty. When calaculating and cliping the gradient of variables in 's_tvars', it runs failed.
In the source code, all the tensorflow varialbles are divided into two sets according to the 'cls_init' and 'bbox_init' files. TF variables in n_tvars set use normal learning rate, variables in s_tvars set use smaller learning rate . However, because there are no 'cls_init' or 'bbox_init' files, I don't know witch tf variables are in the s_tvars list.
To debug this error, I tried to use all of tf variables to calculate and clip gradient such as follow:
After that, the program runs successfully. However, the total loss is abnormally large, especially the bbox cost:
cost loss = 330.16 = cls_cost 10.961 + end_cost 1.566 1 + bbox_cost 317.668 Global norm: 347.896
Until I use larger initial learning rate (e.g., 0.01), the model is convergent.
@myfavouritekk I wonder my solution whether right or not?
When I trained vanilla LSTM and ED-LSTM with Tensorflow, I met runtime error:
ValueError: List argument 'values' tp 'Pack' Op with length 0 shorter than minimum length 1
I found that it was caused by empty tensorflow variable 's_tvars' in src/tpn/model.py : If 'cls_init' and 'bbox_init' is None, the list 'self._small_lr_vars' is empty, thus the list 's_tvars' is empty. When calaculating and cliping the gradient of variables in 's_tvars', it runs failed.In the source code, all the tensorflow varialbles are divided into two sets according to the 'cls_init' and 'bbox_init' files. TF variables in n_tvars set use normal learning rate, variables in s_tvars set use smaller learning rate . However, because there are no 'cls_init' or 'bbox_init' files, I don't know witch tf variables are in the s_tvars list.
To debug this error, I tried to use all of tf variables to calculate and clip gradient such as follow:
After that, the program runs successfully. However, the total loss is abnormally large, especially the bbox cost: cost loss = 330.16 = cls_cost 10.961 + end_cost 1.566 1 + bbox_cost 317.668 Global norm: 347.896 Until I use larger initial learning rate (e.g., 0.01), the model is convergent.
@myfavouritekk I wonder my solution whether right or not?