keras-team / keras-core

A multi-backend implementation of the Keras API, with support for TensorFlow, JAX, and PyTorch.
Apache License 2.0
1.27k stars 117 forks source link

Convert pretraining_BERT.py example to keras core #801

Closed pranavvp16 closed 1 year ago

pranavvp16 commented 1 year ago

Training is slow even in GPU environment is it usual or I'm missing something, tried with different backends still the training time remains constant

fchollet commented 1 year ago

Thanks for the PR. Do you a link to the git diff?

pranavvp16 commented 1 year ago
diff --git a/keras-io/examples/nlp/pretraining_BERT.py b/keras-core/examples/keras_io/nlp/pretraining_BERT.py
index cd2b7bc..8a19225 100644
--- a/keras-io/examples/nlp/pretraining_BERT.py
+++ b/keras-core/examples/keras_io/nlp/pretraining_BERT.py
@@ -84,14 +84,11 @@ import nltk
 import random
 import logging

-import tensorflow as tf
-from tensorflow import keras
+import keras_core as keras

 nltk.download("punkt")
-# Only log error messages
-tf.get_logger().setLevel(logging.ERROR)
 # Set random seed
-tf.keras.utils.set_random_seed(42)
+keras.utils.set_random_seed(42)

 """
 ### Define certain variables
@@ -463,9 +460,9 @@ Now we define our optimizer and compile the model. The loss calculation is handl
 internally and so we need not worry about that!
 """

-optimizer = keras.optimizers.Adam(learning_rate=LEARNING_RATE)
+from keras.optimizers import Adam

-model.compile(optimizer=optimizer)
+model.compile(optimizer=Adam(learning_rate=LEARNING_RATE))

 """
 Finally all steps are done and now we can start training our model!
@@ -507,4 +504,4 @@ model = TFBertForSequenceClassification.from_pretrained("your-username/my-awesom

In this case, the pretraining head will be dropped and the model will just be initialized with the transformer layers. A new task-specific head will be added with random weights. -""" +""" \ No newline at end of file

fchollet commented 1 year ago

Training is slow even in GPU environment is it usual or I'm missing something, tried with different backends still the training time remains constant

Well it's a HuggingFace model that's being trained. There's virtually no overlap with Keras Core (just the use of the Adam optimizer). So time is going to be constant across backends (no difference), and it's going to be slow.