Closed tomeiss closed 3 years ago
Thanks for your email.
We just use an open-source implementation of LSTM. All parts are trained through iterations. TensorFlow will handle it.
In fact, you can delete LSTM modules, and the results are still hopefully good enough.
Actually, you can calculate the approximate number based on network structures. I think it should be similar to values in the paper.
Best, Xin
tobi-tobt notifications@github.com 于2020年9月7日周一 下午11:36写道:
Hello @jiangsutx https://github.com/jiangsutx and @rimchang https://github.com/rimchang, first of all, thanks for providing code to your publication! In the last couple of weeks I have worked through your propsed architecture and have stumbled over three issues, where I would be glad to hear your opinion:
1) Inside the LSTM cell is a bias term for the forget gate _forget_bias set to 1.0: Is this a trainable variable included in the optimization?
2) Regarding the LSTM cell as well: There is no backfeed of the state variable C to forget, input and output gate via Hadamard product as suggested by Shi et al. Is there a specific reason to that?
3) In your publication in Table 1 http://www.xtao.website/projects/srndeblur/srndeblur_cvpr18.pdf the number of total trainable parameteres is given with 3.76 million. But when I calculate them inside your model with varlist_parameters = [v.shape.num_elements() for v in self.all_vars]; np.sum(varlist_parameters ) I receive 6,876,449 parameters. What am I not seeing about that?
Thank you in advance Tobi
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jiangsutx/SRN-Deblur/issues/57, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKK7P557UG6VRZIDVDCHW3SET4ZTANCNFSM4Q6OGOEA .
Hello and thank you for your fast response.
1) I would not know where the LSTM's _forget_bias
is added to the TF graph, which is strange for me. The usual kernels and biases are added to the graph when checking tf.trainable_variables()
during def generator(self, inputs, reuse=False, scope='g_net')
but not the forget bias.
<tf.Variable 'g_net/convLSTM/LSTM_conv/weights:0' shape=(3, 3, 256, 512) dtype=float32_ref>,
<tf.Variable 'g_net/convLSTM/LSTM_conv/biases:0' shape=(512,) dtype=float32_ref>]
3) I took all trainable parameters as list from tensorflow with tf.trainable_variables()
and counted the entries' number of parameter together consisting of weights and biases. My parameter count is far higher than yours. Could you specifically explain how you determined yours?
Kind regards, Tobi
Please refer to the source code of LSTM: https://github.com/jiangsutx/SRN-Deblur/blob/master/util/BasicConvLSTMCell.py#L56
It uses one convolution and then split the output into 4 part, one of which is forget_gate
. And _forget_bias
is only a number, which is not trainable. I did not dig into the details. You may also refer to:
https://tensorlayer.readthedocs.io/en/1.7.0/_modules/tensorlayer/layers.html#BasicConvLSTMCell
I remember in our paper Table 1, we use 3x3 for all kernels for fast experiments. And our final version and released model uses 5x5 kernels. These details have been clarified in the corresponding paragraphs of the paper.
Sorry for the confusion.
Thank you very much for clarifying. I am currently rewritting the architecture to TF 2.2 and after adjusting the kernel sizes I received the same amount of parameters. I must have overlooked this sentence.
st have overloo
hello , can you help me and tell me how can i calculate the paramaters number ?
st have overloo
hello , can you help me and tell me how can i calculate the paramaters number ?
Hello @jiangsutx and @rimchang, first of all, thanks for providing code to your publication! In the last couple of weeks I have worked through your propsed architecture and have stumbled over three issues, where I would be glad to hear your opinion:
1) Inside the LSTM cell is a bias term for the forget gate
_forget_bias
set to 1.0: Is this a trainable variable included in the optimization?2) Regarding the LSTM cell as well: There is no backfeed of the state variable
C
to forget, input and output gate via Hadamard product as suggested by Shi et al. Is there a specific reason to that?3) In your publication in Table 1 the number of total trainable parameteres is given with 3.76 million. But when I calculate them inside your model with
varlist_parameters = [v.shape.num_elements() for v in self.all_vars]; np.sum(varlist_parameters )
I receive 6,876,449 parameters. What am I not seeing about that?Thank you in advance Tobi