Closed DmitriyTkachenko closed 3 years ago
Could you profile what is taking time in your use case?
@fchollet Sure.
I ran 10 epochs of training, and profiled with cProfile
.
Timings were: 2.1.3: ~562 s total time, ~382ms/step, ~38s per epoch. 2.1.4: ~776 s total time, ~571ms/step, ~57s per epoch.
My code is nothing fancy: I read frames extracted from UCF101 videos (flow_from_directory
), resize them to 299x299, use ImageDataGenerator
and fine-tune top layers of InceptionResNetV2
(top 3 Inception blocks (layers after 730th) + avg pooling layer + dense layer for 101 classes) with fit_generator
, very similar to this one (except that it uses InceptionV3
): https://github.com/harvitronix/five-video-classification-methods/blob/master/train_cnn.py.
Profiling results with 2.1.3 (top 50, ordered by cumtime
):
Wed Mar 7 14:41:18 2018 profile_output_2.1.3 38406328 function calls (37949471 primitive calls) in 561.650 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 2657/1 0.105 0.000 561.687 561.687 {built-in method builtins.exec} 1 0.000 0.000 561.687 561.687 train_cnn.py:1() 1 0.000 0.000 559.803 559.803 train_cnn.py:110(main) 1408/1397 0.009 0.000 448.522 0.321 /usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py:27(wrapper) 1 0.000 0.000 445.389 445.389 train_cnn.py:98(train_model) 1 1.486 1.486 445.386 445.386 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1917(fit_generator) 2457 0.012 0.000 407.570 0.166 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:797(run) 2457 0.099 0.000 407.558 0.166 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1051(_run) 2457 0.024 0.000 406.598 0.165 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1297(_do_run) 2457 0.004 0.000 406.468 0.165 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1359(_do_call) 2457 0.088 0.000 406.465 0.165 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1329(_run_fn) 2457 403.009 0.164 403.009 0.164 {built-in method _pywrap_tensorflow_internal.TF_Run} 1100 0.039 0.000 353.895 0.322 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2461(__call__) 1000 0.011 0.000 321.078 0.321 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1802(train_on_batch) 3841 113.689 0.030 113.689 0.030 {method 'acquire' of '_thread.lock' objects} 611 0.005 0.000 113.524 0.186 /usr/lib/python3.6/threading.py:263(wait) 1321 0.004 0.000 113.497 0.086 /usr/lib/python3.6/threading.py:533(wait) 1304 0.004 0.000 113.486 0.087 /usr/lib/python3.6/multiprocessing/pool.py:637(get) 1304 0.003 0.000 113.481 0.087 /usr/lib/python3.6/multiprocessing/pool.py:634(wait) 1 0.000 0.000 85.385 85.385 train_cnn.py:33(get_generators) 2 0.033 0.016 85.385 42.693 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:528(flow_from_directory) 2 0.190 0.095 85.352 42.676 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:1057(__init__) 10 0.000 0.000 54.014 5.401 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:67(on_epoch_end) 10 0.000 0.000 54.002 5.400 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:396(on_epoch_end) 3 0.000 0.000 54.001 18.000 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2575(save_weights) 3 0.513 0.171 53.889 17.963 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2866(save_weights_to_hdf5_group) 2346 0.021 0.000 48.271 0.021 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2310(batch_get_value) 10 0.113 0.011 34.132 3.413 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:2229(evaluate_generator) 100 0.001 0.000 34.001 0.340 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1854(test_on_batch) 427875/248283 0.161 0.000 30.328 0.000 {built-in method builtins.next} 1102 0.022 0.000 28.589 0.026 /usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py:543(get) 1 0.000 0.000 26.736 26.736 train_cnn.py:61(get_model) 1 0.000 0.000 26.551 26.551 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:177(InceptionResNetV2) 781 0.014 0.000 18.236 0.023 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:541(__call__) 244 0.003 0.000 18.190 0.075 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:58(conv2d_bn) 40 0.001 0.000 15.608 0.390 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:98(inception_resnet_block) 204 0.003 0.000 14.889 0.073 /usr/local/lib/python3.6/dist-packages/keras/layers/normalization.py:130(call) 408 0.001 0.000 11.732 0.029 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:989(moving_average_update) 408 0.005 0.000 11.731 0.029 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:33(assign_moving_average) 408 0.016 0.000 11.355 0.028 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:156(_zero_debias) 2451 0.484 0.000 11.174 0.005 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:149(get_session) 20731/19915 1.086 0.000 10.976 0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:339(_apply_op_helper) 2 0.021 0.010 9.996 4.998 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2608(load_weights) 2 0.030 0.015 9.968 4.984 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:3101(load_weights_from_hdf5_group) 2 0.014 0.007 9.570 4.785 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2346(batch_set_value) 26748/25932 0.272 0.000 8.847 0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3197(create_op) 2 0.000 0.000 7.702 3.851 /usr/lib/python3.6/multiprocessing/pool.py:261(map) 8980 0.011 0.000 5.092 0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:5502(get_collection) 8984 0.682 0.000 5.061 0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3770(get_collection) 26748 0.378 0.000 4.945 0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3303(_create_op_helper) 816 0.004 0.000 4.851 0.006 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:195(_maybe_get_unique)
Profiling results with 2.1.4:
Wed Mar 7 14:31:56 2018 profile_output 40394572 function calls (39898341 primitive calls) in 775.684 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 2657/1 0.105 0.000 775.722 775.722 {built-in method builtins.exec} 1 0.000 0.000 775.722 775.722 train_cnn.py:1() 1 0.000 0.000 772.628 772.628 train_cnn.py:110(main) 1408/1397 0.009 0.000 645.699 0.462 /usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py:27(wrapper) 1 0.000 0.000 642.624 642.624 train_cnn.py:98(train_model) 1 1.653 1.653 642.621 642.621 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1957(fit_generator) 2008 0.012 0.000 419.814 0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:797(run) 2008 0.103 0.000 419.802 0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1051(_run) 2008 0.020 0.000 418.851 0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1297(_do_run) 2008 0.003 0.000 418.727 0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1359(_do_call) 2008 0.098 0.000 418.723 0.209 /usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1329(_run_fn) 2008 414.555 0.206 414.555 0.206 {built-in method _pywrap_tensorflow_internal.TF_Run} 1100 0.043 0.000 355.672 0.323 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2461(__call__) 1000 0.013 0.000 322.904 0.323 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1844(train_on_batch) 5405 310.948 0.058 310.948 0.058 {method 'acquire' of '_thread.lock' objects} 1002 0.008 0.000 310.868 0.310 /usr/lib/python3.6/threading.py:263(wait) 1304 0.004 0.000 310.820 0.238 /usr/lib/python3.6/multiprocessing/pool.py:637(get) 1321 0.006 0.000 310.818 0.235 /usr/lib/python3.6/threading.py:533(wait) 1304 0.003 0.000 310.814 0.238 /usr/lib/python3.6/multiprocessing/pool.py:634(wait) 480791/284227 0.181 0.000 216.886 0.001 {built-in method builtins.next} 1102 0.021 0.000 215.263 0.195 /usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py:552(get) 1 0.000 0.000 96.235 96.235 train_cnn.py:33(get_generators) 2 0.029 0.015 96.235 48.117 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:528(flow_from_directory) 2 0.182 0.091 96.205 48.102 /usr/local/lib/python3.6/dist-packages/keras/preprocessing/image.py:1059(__init__) 10 0.000 0.000 61.728 6.173 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:68(on_epoch_end) 10 0.000 0.000 61.717 6.172 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:425(on_epoch_end) 2 0.000 0.000 61.717 30.858 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2578(save_weights) 2 0.445 0.223 61.642 30.821 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2870(save_weights_to_hdf5_group) 1564 0.015 0.000 57.573 0.037 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2310(batch_get_value) 10 0.093 0.009 34.180 3.418 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:2296(evaluate_generator) 100 0.001 0.000 34.071 0.341 /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:1895(test_on_batch) 1 0.000 0.000 30.462 30.462 train_cnn.py:61(get_model) 1 0.000 0.000 30.277 30.277 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:177(InceptionResNetV2) 781 0.014 0.000 21.591 0.028 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:541(__call__) 244 0.003 0.000 21.453 0.088 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:58(conv2d_bn) 40 0.001 0.000 18.348 0.459 /usr/local/lib/python3.6/dist-packages/keras/applications/inception_resnet_v2.py:98(inception_resnet_block) 204 0.005 0.000 18.209 0.089 /usr/local/lib/python3.6/dist-packages/keras/layers/normalization.py:130(call) 2 0.000 0.000 15.114 7.557 /usr/lib/python3.6/multiprocessing/pool.py:261(map) 23587/22567 1.268 0.000 12.960 0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:339(_apply_op_helper) 2002 0.433 0.000 12.111 0.006 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:149(get_session) 408 0.001 0.000 11.685 0.029 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:989(moving_average_update) 408 0.004 0.000 11.684 0.029 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:33(assign_moving_average) 408 0.015 0.000 11.308 0.028 /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/moving_averages.py:156(_zero_debias) 2 0.021 0.010 11.281 5.640 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:2610(load_weights) 2 0.034 0.017 11.244 5.622 /usr/local/lib/python3.6/dist-packages/keras/engine/topology.py:3121(load_weights_from_hdf5_group) 32256/31440 0.266 0.000 10.712 0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3197(create_op) 2 0.016 0.008 10.676 5.338 /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2346(batch_set_value) 32256 0.407 0.000 5.778 0.000 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3303(_create_op_helper) 1 0.000 0.000 5.367 5.367 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:50(set_model) 1 0.000 0.000 5.367 5.367 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:717(set_model) 8531 0.009 0.000 5.031 0.001 /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:5502(get_collection)
I have emphasized the differences I noticed in bold. It looks like in 2.1.4 ~200 more seconds was spent acquiring locks, possibly done somewhere in data_utils.py:552
(which likely uses some iterator because builtins.next
method is hot, and this iterator could be acquiring locks during multi-threaded image augmentation; but I'm not familiar with this code).
Also here's the visualization from qcachegrind
:
2.1.4:
2.1.3:
38.12% vs 18.83% of time spent in wait
.
I'll look more into it tonight, but does this still happen if you disable DA completely? The only change I see between those 2 versions is that we now use bilinear for DA.
@Dref360 Thank you. Yes, this only happens if DA is enabled. I use these parameters:
ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
horizontal_flip=True,
rotation_range=10.,
width_shift_range=0.2,
height_shift_range=0.2
)
Also there's no slowdown if I only use rescale
, so with ImageDataGenerator(rescale=1. / 255)
.
Probably because we now do spline interpolation. https://github.com/keras-team/keras/commit/9b08f3335aa9e03c450b4c286a9c2bd304db188b#diff-93850fa46a789f2e5905894ad0e7bee4
also seeing this; for 250x250 i get >30% slower. i fixed by setting interpolation_order=0 (using ImageDataGenerator from keras_preprocessing)
by the way this parameter is missing from the doc strings, and the change is undocumented
Hi,
I've noticed that image augmentation became slower in 2.1.4 (compared to 2.1.3). It's somewhat noticeable for small images (like 32x32 in CIFAR10) where the slowdown is ~7%, but much more significant for larger images (like 299x299 in ImageNet). I've noticed that for my use-case with 299x299 images the step in epoch started taking ~575 ms instead of ~400 ms with 2.1.3, which is ~30% slower.
Hardware: Intel i5-2500K, GTX 1080. TensorFlow version: 1.6 (same with 1.5 and 1.4, with or without Docker). Keras version: 2.1.4 installed via pip.
Reproduction steps:
pip uninstall -y keras && pip install keras==2.1.3
.cifar10_cnn.py
again and compare timings.My results: 2.1.4: 9ms/step, 14s per epoch. Example:
1563/1563 [==============================] - 14s 9ms/step - loss: 1.3354 - acc: 0.5213 - val_loss: 1.1781 - val_acc: 0.5857
2.1.3: 9ms/step, 13s per epoch. Example:1563/1563 [==============================] - 13s 9ms/step - loss: 1.2962 - acc: 0.5376 - val_loss: 1.1254 - val_acc: 0.6036
As I said, this is barely noticeable for small images and the slowdown is around 7%. My custom code uses
ImageDataGenerator
withflow_from_directory
(withtarget_size=(299, 299)
) and became ~30% slower (after update from 2.1.3 to 2.1.4), GPU utilization also decreased (although CPU usage is far from 100%). Please let me know if the current reproduction steps don't help and I'll try to find another example with larger images or adapt my code for easier reproduction.Thanks.