microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Bad performance with variable-length input with Convolution #3398

Open artbataev opened 6 years ago

artbataev commented 6 years ago

While using variable length input with CNN, performance significantly decreases compared to fixed length input. Why?

To reproduce the result I've created simple network. I use one CPU kernel (CPU usage is limited with taskset), but practically the same result is observed on GPU.

Using

current_size = np.random.randint(low=50, high=60+1) # from 50 to 60

is 62% slower (1.863 sec / batch) than

current_size = 60 # fixed size

(1.153 sec / batch)

import time
import cntk
import numpy as np

if __name__ == "__main__":
    cntk.debugging.force_deterministic(0)
    device = cntk.device.cpu() 
    cntk.device.try_set_default_device(device)

    model = cntk.layers.Sequential([
        cntk.layers.Convolution2D(filter_shape=(3, 3), num_filters=128, pad=[True, True], activation=cntk.relu),
        cntk.layers.Convolution2D(filter_shape=(3, 3), num_filters=256, pad=[True, True], activation=cntk.relu),
        cntk.layers.Convolution2D(filter_shape=(3, 3), num_filters=512, pad=[True, True], activation=cntk.relu),
        cntk.layers.Convolution2D(filter_shape=(3, 3), num_filters=512, pad=[True, True], activation=cntk.relu),
        cntk.layers.Convolution2D(filter_shape=(3, 3), num_filters=512, pad=[True, True], activation=cntk.relu),
    ])

    input_var = cntk.input_variable((3, cntk.FreeDimension, 120))
    model = model(input_var)
    print(model)

    total_time = 0
    num_steps = 50
    for i in range(num_steps):
        current_size = 60 # np.random.randint(low=50, high=60+1)
        dummy_input = np.random.randn(1, 3, current_size, 120).astype(np.float32)

        start = time.time()
        _ = model.eval({model.arguments[0]: dummy_input}, device=device)
        total_time += time.time() - start

    print("average time for batch: {:.3f} sec".format(total_time / num_steps))
taskset -c 0 python test_speed.py

System information:

jaliyae commented 6 years ago

We investigated this and Spandan and Bowen found that the calculation of new convolve geometry was the culprit for the slowdown. If the input size is fixed, cntk only calculate this once, but with the variable sizes, it needs to calculate the geometry for each forward call.

artbataev commented 6 years ago

Thanks for the reply. Is it possible to speed up these calculations? I think it is a significant and unexpected drawback: the network is rather big, and input size calculation seems to be not very expensive compared to convolution computation.

jaliyae commented 6 years ago

The slowdown happens due to the division and mod operations that are performed in calculating geometry. So avoiding the variable sizes could be the best option for now.