keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.99k stars 19.48k forks source link

Training performance degradation after switching from Keras 2 mode to Keras 3 using Tensorflow #20283

Open DavidHidde opened 1 month ago

DavidHidde commented 1 month ago

I've been working on upgrading my Keras 2 code to just work with Keras 3 without going fully back-end agnostic. However, while everything works fine after resolving compatibility, my training speed has severely degraded by maybe even a factor 10. I've changed the following to get Keras 3 working:

  1. Changed tensorflow.keras to keras calls.
  2. Updated model/weights saving and loading to use the new export function and weights.h5 format.
  3. Updated a callback at the end of the epoch to be a keras.Callback instead of the old BaseLogger.
  4. Added @keras.saving.register_keras_serializable() to custom metric and loss functions.
  5. Updated my online dataset generator to use keras.Sequential data augmentation instead of the removed ImageDataGenerator.
  6. Removed the max_queue_size kwarg from the model.fit and model.predict calls since it has been removed.

In terms of hardware/packages, I'm using Python 3.11.10, keras 3.5.0 and Tensorflow 2.16.2 on a Macbook Pro M2. I've also noticed that my GPU and CPU usage is much higher while running the newer version. I've confirmed using git stash that specifically the changes mentioned above are causing the performance degradation. My suspicion is that the Apple hardware is somehow resulting in worse performance, but I've yet to confirm it using a regular x86 machine.

fchollet commented 1 month ago

Updated my online dataset generator to use keras.Sequential data augmentation instead of the removed ImageDataGenerator.

Are you using tf.data? That's what you want to use to see good performance with TF.

You also want to make sure that you're using the GPU on your machine.

DavidHidde commented 1 month ago

Hi,

Currently the code uses custom generators, but when I rewrite it to use tf.data.Dataset the performance stays the same. In terms of CPU/GPU usage, my system does not show any major differences:

Keras 3 legacy mode:

Screenshot 2024-09-30 at 14 37 40

Keras 3:

Screenshot 2024-09-30 at 14 35 00
DavidHidde commented 4 weeks ago

Any updates on this?