Open mbarbetti opened 3 months ago
Repeating the exercise by taking scripts/train_ANN_isMuon.py as a reference for 10 epochs and with a dataset of 300000 instances, we observe similar performance. However, this script does not rely on a custom training procedure but it uses a PIDGAN model that is a simple wrap of the Keras Model class. This may exclude PIDGAN from being the source of this issue in favor of Keras 3 itself.
Test machine details: Intel(R) Xeon(R) Gold 6140M CPU @ 2.30GHz (no GPU card equipped)
Launched command:
python train_ANN_isMuon.py -p pion -E 10 -C 300_000 -D 2016MU --test
Running on Keras 2.14.0:
[...]
Epoch 10/10
102/102 [==============================] - 1s 15ms/step - loss: 0.2095 - auc: 0.7671 - lr: 9.5631e-04 - val_loss: 0.2103 - val_auc: 0.7614
[INFO] Model training completed in 0h 00min 16s
while running on Keras 3.3.3:
[...]
Epoch 10/10
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 16ms/step - auc: 0.7592 - loss: 0.2191 - lr: 9.5631e-04 - val_auc: 0.7655 - val_loss: 0.2188
[INFO] Model training completed in 0h 00min 19s
passing from 16 seconds for the training on Keras 2 to 19 seconds on Keras 3 (+19% training time).
Even repeating both the previous exercises (with scripts/train_ANN_isMuon.py and scripts/train_GAN_Rich.py) for 10 epochs and with a dataset of 300000 instances after having removed also the learning rate scheduling (callbacks=None
in the fit()
method), one observes the same drop of timing performance.
Test machine details: Intel(R) Xeon(R) Gold 6140M CPU @ 2.30GHz (no GPU card equipped)
Launched command:
python train_ANN_isMuon.py -p pion -E 10 -C 300_000 -D 2016MU --test
Running on Keras 2.14.0:
[...]
Epoch 10/10
102/102 [==============================] - 1s 13ms/step - loss: 0.2196 - auc: 0.7632 - val_loss: 0.2164 - val_auc: 0.7646
[INFO] Model training completed in 0h 00min 15s
while running on Keras 3.3.3:
[...]
Epoch 10/10
102/102 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - auc: 0.7596 - loss: 0.2201 - val_auc: 0.7678 - val_loss: 0.2163
[INFO] Model training completed in 0h 00min 17s
passing from 15 seconds for the training on Keras 2 to 17 seconds on Keras 3 (+13% training time).
Launched command:
python train_GAN_Rich.py -p pion -E 10 -C 300_000 -D 2016MU --test
Running on Keras 2.14.0:
[...]
Epoch 10/10
102/102 [==============================] - 4s 40ms/step - g_loss: 1.6483 - d_loss: 0.5900 - accuracy: 0.2859 - bce: 2.0635 - val_g_loss: 0.9593 - val_d_loss: 0.6733 - val_accuracy: 0.2897 - val_bce: 2.0299
[INFO] Model training completed in 0h 00min 44s
while running on Keras 3.3.3:
[...]
Epoch 10/10
102/102 ━━━━━━━━━━━━━━━━━━━━ 5s 49ms/step - accuracy: 0.2851 - bce: 1.9503 - d_loss: 0.5958 - g_loss: 1.5373 - val_accuracy: 0.2896 - val_bce: 1.8913 - val_d_loss: 0.6007 - val_g_loss: 1.0399
[INFO] Model training completed in 0h 00min 54s
passing from 44 seconds for the training on Keras 2 to 54 seconds on Keras 3 (+23% training time).
Following the suggestions of @fchollet, it seems that by playing with jit_compile=True
Keras 3 outperforms Keras 2 in timing performance.
source: https://github.com/keras-team/keras/issues/19953#issuecomment-2210586395
Using the latest version of PIDGAN (v0.2.0), we have noticed an unexpected behavior while running GAN trainings with Keras 2 or Keras 3. In particular, taking as a reference scripts/train_GAN_Rich.py for 10 epochs and with a dataset of 300000 instances, we observe a drop of timing performance of about 20% from Keras 2 to Keras 3.
Test machine details: Intel(R) Xeon(R) Gold 6140M CPU @ 2.30GHz (no GPU card equipped)
Launched command:
Running on Keras 2.14.0:
while running on Keras 3.3.3:
passing from 46 seconds for the training on Keras 2 to 55 seconds on Keras 3 (+20% training time).
Repeating the exercise without passing any metrics (
metrics=None
incompile()
) on Keras 2.14.0, we have:while running without any metrics on Keras 3.3.3:
passing from 34 seconds for the training on Keras 2 to 40 seconds on Keras 3 (+18% training time).