keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.9k stars 19.45k forks source link

Keras 2.1.4: Possible performance regression in ImageDataGenerator #9577

Closed DmitriyTkachenko closed 3 years ago

DmitriyTkachenko commented 6 years ago

Hi,

I've noticed that image augmentation became slower in 2.1.4 (compared to 2.1.3). It's somewhat noticeable for small images (like 32x32 in CIFAR10) where the slowdown is ~7%, but much more significant for larger images (like 299x299 in ImageNet). I've noticed that for my use-case with 299x299 images the step in epoch started taking ~575 ms instead of ~400 ms with 2.1.3, which is ~30% slower.

Hardware: Intel i5-2500K, GTX 1080. TensorFlow version: 1.6 (same with 1.5 and 1.4, with or without Docker). Keras version: 2.1.4 installed via pip.

Reproduction steps:

  1. Make sure you're on 2.1.4.
  2. Run https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py.
  3. Downgrade to 2.1.3: pip uninstall -y keras && pip install keras==2.1.3.
  4. Run cifar10_cnn.py again and compare timings.

My results: 2.1.4: 9ms/step, 14s per epoch. Example: 1563/1563 [==============================] - 14s 9ms/step - loss: 1.3354 - acc: 0.5213 - val_loss: 1.1781 - val_acc: 0.5857 2.1.3: 9ms/step, 13s per epoch. Example: 1563/1563 [==============================] - 13s 9ms/step - loss: 1.2962 - acc: 0.5376 - val_loss: 1.1254 - val_acc: 0.6036

As I said, this is barely noticeable for small images and the slowdown is around 7%. My custom code uses ImageDataGenerator with flow_from_directory (with target_size=(299, 299)) and became ~30% slower (after update from 2.1.3 to 2.1.4), GPU utilization also decreased (although CPU usage is far from 100%). Please let me know if the current reproduction steps don't help and I'll try to find another example with larger images or adapt my code for easier reproduction.

Thanks.

fchollet commented 6 years ago

Could you profile what is taking time in your use case?

DmitriyTkachenko commented 6 years ago

@fchollet Sure.

I ran 10 epochs of training, and profiled with cProfile.

Timings were: 2.1.3: ~562 s total time, ~382ms/step, ~38s per epoch. 2.1.4: ~776 s total time, ~571ms/step, ~57s per epoch.

My code is nothing fancy: I read frames extracted from UCF101 videos (flow_from_directory), resize them to 299x299, use ImageDataGenerator and fine-tune top layers of InceptionResNetV2 (top 3 Inception blocks (layers after 730th) + avg pooling layer + dense layer for 101 classes) with fit_generator, very similar to this one (except that it uses InceptionV3): https://github.com/harvitronix/five-video-classification-methods/blob/master/train_cnn.py.

Profiling results with 2.1.3 (top 50, ordered by cumtime):

Wed Mar  7 14:41:18 2018    profile_output_2.1.3

         38406328 function calls (37949471 primitive calls) in 561.650 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   2657/1    0.105    0.000  561.687  561.687 {built-in method builtins.exec}
        1    0.000    0.000  561.687  561.687 train_cnn.py:1()
        1    0.000    0.000  559.803  559.803 train_cnn.py:110(main)
1408/1397    0.009    0.000  448.522    0.321 /usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py:27(wrapper)
        1    0.000    0.000  445.389  445.389 train_cnn.py:98(train_model)
        1    1.486    1.486  445.386  445.386 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1917(fit_generator)
     2457    0.012    0.000  407.570    0.166 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:797(run)
     2457    0.099    0.000  407.558    0.166 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1051(_run)
     2457    0.024    0.000  406.598    0.165 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1297(_do_run)
     2457    0.004    0.000  406.468    0.165 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1359(_do_call)
     2457    0.088    0.000  406.465    0.165 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1329(_run_fn)
     2457  403.009    0.164  403.009    0.164 {built-in method _pywrap_tensorflow_internal.TF_Run}
     1100    0.039    0.000  353.895    0.322 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2461(__call__)
     1000    0.011    0.000  321.078    0.321 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1802(train_on_batch)
     3841  113.689    0.030  113.689    0.030 {method 'acquire' of '_thread.lock' objects}
      611    0.005    0.000  113.524    0.186 /usr/lib/python3.6/threading.py:263(wait)
     1321    0.004    0.000  113.497    0.086 /usr/lib/python3.6/threading.py:533(wait)
     1304    0.004    0.000  113.486    0.087 /usr/lib/python3.6/multiprocessing/pool.py:637(get)
     1304    0.003    0.000  113.481    0.087 /usr/lib/python3.6/multiprocessing/pool.py:634(wait)
        1    0.000    0.000   85.385   85.385 train_cnn.py:33(get_generators)
        2    0.033    0.016   85.385   42.693 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:528(flow_from_directory)
        2    0.190    0.095   85.352   42.676 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:1057(__init__)
       10    0.000    0.000   54.014    5.401 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:67(on_epoch_end)
       10    0.000    0.000   54.002    5.400 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:396(on_epoch_end)
        3    0.000    0.000   54.001   18.000 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2575(save_weights)
        3    0.513    0.171   53.889   17.963 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2866(save_weights_to_hdf5_group)
     2346    0.021    0.000   48.271    0.021 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2310(batch_get_value)
       10    0.113    0.011   34.132    3.413 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:2229(evaluate_generator)
      100    0.001    0.000   34.001    0.340 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1854(test_on_batch)
427875/248283    0.161    0.000   30.328    0.000 {built-in method builtins.next}
     1102    0.022    0.000   28.589    0.026 /usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py:543(get)
        1    0.000    0.000   26.736   26.736 train_cnn.py:61(get_model)
        1    0.000    0.000   26.551   26.551 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:177(InceptionResNetV2)
      781    0.014    0.000   18.236    0.023 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:541(__call__)
      244    0.003    0.000   18.190    0.075 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:58(conv2d_bn)
       40    0.001    0.000   15.608    0.390 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:98(inception_resnet_block)
      204    0.003    0.000   14.889    0.073 /usr/local/lib/python3.6/dist-packages/keras/layers/normalization.py:130(call)
      408    0.001    0.000   11.732    0.029 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:989(moving_average_update)
      408    0.005    0.000   11.731    0.029 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:33(assign_moving_average)
      408    0.016    0.000   11.355    0.028 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:156(_zero_debias)
     2451    0.484    0.000   11.174    0.005 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:149(get_session)
20731/19915    1.086    0.000   10.976    0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:339(_apply_op_helper)
        2    0.021    0.010    9.996    4.998 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2608(load_weights)
        2    0.030    0.015    9.968    4.984 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:3101(load_weights_from_hdf5_group)
        2    0.014    0.007    9.570    4.785 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2346(batch_set_value)
26748/25932    0.272    0.000    8.847    0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3197(create_op)
        2    0.000    0.000    7.702    3.851 /usr/lib/python3.6/multiprocessing/pool.py:261(map)
     8980    0.011    0.000    5.092    0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:5502(get_collection)
     8984    0.682    0.000    5.061    0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3770(get_collection)
    26748    0.378    0.000    4.945    0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3303(_create_op_helper)
      816    0.004    0.000    4.851    0.006 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:195(_maybe_get_unique)

Profiling results with 2.1.4:

Wed Mar  7 14:31:56 2018    profile_output

         40394572 function calls (39898341 primitive calls) in 775.684 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   2657/1    0.105    0.000  775.722  775.722 {built-in method builtins.exec}
        1    0.000    0.000  775.722  775.722 train_cnn.py:1()
        1    0.000    0.000  772.628  772.628 train_cnn.py:110(main)
1408/1397    0.009    0.000  645.699    0.462 /usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py:27(wrapper)
        1    0.000    0.000  642.624  642.624 train_cnn.py:98(train_model)
        1    1.653    1.653  642.621  642.621 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1957(fit_generator)
     2008    0.012    0.000  419.814    0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:797(run)
     2008    0.103    0.000  419.802    0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1051(_run)
     2008    0.020    0.000  418.851    0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1297(_do_run)
     2008    0.003    0.000  418.727    0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1359(_do_call)
     2008    0.098    0.000  418.723    0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1329(_run_fn)
     2008  414.555    0.206  414.555    0.206 {built-in method _pywrap_tensorflow_internal.TF_Run}
     1100    0.043    0.000  355.672    0.323 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2461(__call__)
     1000    0.013    0.000  322.904    0.323 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1844(train_on_batch)
     5405  310.948    0.058  310.948    0.058 {method 'acquire' of '_thread.lock' objects}
     1002    0.008    0.000  310.868    0.310 /usr/lib/python3.6/threading.py:263(wait)
     1304    0.004    0.000  310.820    0.238 /usr/lib/python3.6/multiprocessing/pool.py:637(get)
     1321    0.006    0.000  310.818    0.235 /usr/lib/python3.6/threading.py:533(wait)
     1304    0.003    0.000  310.814    0.238 /usr/lib/python3.6/multiprocessing/pool.py:634(wait)
480791/284227    0.181    0.000  216.886    0.001 {built-in method builtins.next}
     1102    0.021    0.000  215.263    0.195 /usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py:552(get)
        1    0.000    0.000   96.235   96.235 train_cnn.py:33(get_generators)
        2    0.029    0.015   96.235   48.117 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:528(flow_from_directory)
        2    0.182    0.091   96.205   48.102 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:1059(__init__)
       10    0.000    0.000   61.728    6.173 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:68(on_epoch_end)
       10    0.000    0.000   61.717    6.172 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:425(on_epoch_end)
        2    0.000    0.000   61.717   30.858 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2578(save_weights)
        2    0.445    0.223   61.642   30.821 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2870(save_weights_to_hdf5_group)
     1564    0.015    0.000   57.573    0.037 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2310(batch_get_value)
       10    0.093    0.009   34.180    3.418 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:2296(evaluate_generator)
      100    0.001    0.000   34.071    0.341 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1895(test_on_batch)
        1    0.000    0.000   30.462   30.462 train_cnn.py:61(get_model)
        1    0.000    0.000   30.277   30.277 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:177(InceptionResNetV2)
      781    0.014    0.000   21.591    0.028 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:541(__call__)
      244    0.003    0.000   21.453    0.088 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:58(conv2d_bn)
       40    0.001    0.000   18.348    0.459 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:98(inception_resnet_block)
      204    0.005    0.000   18.209    0.089 /usr/local/lib/python3.6/dist-packages/keras/layers/normalization.py:130(call)
        2    0.000    0.000   15.114    7.557 /usr/lib/python3.6/multiprocessing/pool.py:261(map)
23587/22567    1.268    0.000   12.960    0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:339(_apply_op_helper)
     2002    0.433    0.000   12.111    0.006 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:149(get_session)
      408    0.001    0.000   11.685    0.029 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:989(moving_average_update)
      408    0.004    0.000   11.684    0.029 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:33(assign_moving_average)
      408    0.015    0.000   11.308    0.028 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:156(_zero_debias)
        2    0.021    0.010   11.281    5.640 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2610(load_weights)
        2    0.034    0.017   11.244    5.622 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:3121(load_weights_from_hdf5_group)
32256/31440    0.266    0.000   10.712    0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3197(create_op)
        2    0.016    0.008   10.676    5.338 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2346(batch_set_value)
    32256    0.407    0.000    5.778    0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3303(_create_op_helper)
        1    0.000    0.000    5.367    5.367 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:50(set_model)
        1    0.000    0.000    5.367    5.367 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:717(set_model)
     8531    0.009    0.000    5.031    0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:5502(get_collection)

I have emphasized the differences I noticed in bold. It looks like in 2.1.4 ~200 more seconds was spent acquiring locks, possibly done somewhere in data_utils.py:552 (which likely uses some iterator because builtins.next method is hot, and this iterator could be acquiring locks during multi-threaded image augmentation; but I'm not familiar with this code).

Also here's the visualization from qcachegrind: 2.1.4: image

2.1.3: image

38.12% vs 18.83% of time spent in wait.

Dref360 commented 6 years ago

I'll look more into it tonight, but does this still happen if you disable DA completely? The only change I see between those 2 versions is that we now use bilinear for DA.

DmitriyTkachenko commented 6 years ago

@Dref360 Thank you. Yes, this only happens if DA is enabled. I use these parameters:

ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    horizontal_flip=True,
    rotation_range=10.,
    width_shift_range=0.2,
    height_shift_range=0.2
)

Also there's no slowdown if I only use rescale, so with ImageDataGenerator(rescale=1. / 255).

Dref360 commented 6 years ago

Probably because we now do spline interpolation. https://github.com/keras-team/keras/commit/9b08f3335aa9e03c450b4c286a9c2bd304db188b#diff-93850fa46a789f2e5905894ad0e7bee4

eyaler commented 5 years ago

also seeing this; for 250x250 i get >30% slower. i fixed by setting interpolation_order=0 (using ImageDataGenerator from keras_preprocessing)

eyaler commented 5 years ago

by the way this parameter is missing from the doc strings, and the change is undocumented