Trainable Parameters Different

strongio / keras-bert

A simple technique to integrate BERT from tf hub to keras

258 stars 108 forks source link

Trainable Parameters Different #11

Closed devinharia closed 5 years ago

devinharia commented 5 years ago

I ran the code from 'keras-bert.ipynb' as it is and observed that the number of trainable parameters in my run is '22,051,329' instead of '3,147,009' in your run of the notebook. Also my accuracy is just about 0.53. Can you please help me out. Thanks!

tienduccao commented 5 years ago

I got the same issue

LFavano commented 5 years ago

Same for me, also the result of the model.predict reports almost the same exact confidence score for each entry, which is quite weird, like it's not learning anything at all.

jacobzweig commented 5 years ago

Hi @devinharia @tienduccao and @LFavano, to see convergence here you'll need to reduce the learning rate. With 1e-5 I see nice convergence on this task.

tienduccao commented 5 years ago

@jacobzweig I don't get it, could you be more specific? How to set the learning rate?

hoangcuong2011 commented 4 years ago

I think there is still some secret/trick behind the code that without that it does not work. I tried and got only 0.53 or so accuracy. I felt like the model does not learn anything during training. Also the number of parameters does not really match, as @devinharia pointed out.

@jacobzweig: your answer is unfortunately totally not satisfied and all of us truly need a more elaboration on this, if you have time. Thank you very much!

GavinAbercrombie commented 4 years ago

The learning rate of the optimizer needs to be set before the model is compiled:

opt = tf.keras.optimizers.Adam(learning_rate=1e-5) model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

Learning will now converge (although it does not answer the original query re: difference in number of parameters).