strongio / keras-bert

A simple technique to integrate BERT from tf hub to keras
258 stars 108 forks source link

Wrong order of values ​​when calling bert.variables and fine tune after that #5

Closed igeti closed 5 years ago

igeti commented 5 years ago

Thank you very much for the article. After that, I wanted to understand BERT more deeply and found the following thing in your code. For fine tune, you use the following line of code: trainable_vars = self.bert.variables trainable_vars = trainable_vars [-self.n_fine_tune_layers:] However, self.bert.variables returns the list sorted by variable names, and therefore the 11th block of the transformer goes before 9. And with fine tune, intermediate layers are trained when the others are completely frozen.

bert.variables return


 <tf.Variable 'BERT_module_1/bert/embeddings/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/position_embeddings:0' shape=(512, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/word_embeddings:0' shape=(119547, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/output_bias:0' shape=(119547,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/kernel:0' shape=(768, 768) dtype=float32>]```
zabithameed commented 5 years ago

Dear kkkyan, please refer to the line, layerno = int((var.name.split("/")[3]).split("")[-1]),

The error I faced is, ValueError: invalid literal for int() with base 10: 'encoder'.