microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Is it possible to take every k elements along FreeDimension axis? #3527

Open artbataev opened 5 years ago

artbataev commented 5 years ago

Is it possible to do slice with step along dynamic axis? I'm working with sequence model and want to take every 3rd frame.

I found that cntk.sequence.slice doesn't support step parameter.

Also this example

n_channels = 12
input_var = cntk.sequence.input_variable([cntk.FreeDimension, n_channels])
model = cntk.slice(input_var, axis=0, begin_index=0, end_index=0, strides=3)
x = np.random.rand(1, 6, n_channels).astype(np.float32)
print(model.eval({model.arguments[0]: x}))

raises an error:

RuntimeError: Function 'Slice: Input('Input17727', [#, *], [* x 12]) -> Unknown': Slice operation index range [0,0), interpreted as [0,-3), is invalid for input 'Input('Input17727', [#, *], [* x 12])' shape '[* x 12]'.
delzac commented 5 years ago

You probably will have to do a C.sequence.unpack, slice from there, and C.to_sequence.

artbataev commented 5 years ago

@delzac, it also doesn't work:

n_channels = 3
input_var = cntk.sequence.input_variable([cntk.FreeDimension, n_channels])
unpacked_input = cntk.sequence.unpack(input_var, padding_value=0, no_mask_output=True)
sliced = cntk.slice(unpacked_input, axis=1, begin_index=0, end_index=0, strides=3)
model = cntk.to_sequence(sliced)

x = np.random.rand(1, 6, n_channels).astype(np.float32)
print(x)
print(model.eval({model.arguments[0]: x}))

raises exception:

RuntimeError: Function 'Slice: Output('UnpackSequenceOp17879_Output_0', [#], [* x * x 3]) -> Unknown': Slice operation index range [0,0), interpreted as [0,-3), is invalid for input 'Output('UnpackSequenceOp17879_Output_0', [#], [* x * x 3])' shape '[* x * x 3]'. But works correctly with axis=2

delzac commented 5 years ago

First, your sample code runs without error on my computer. What version of cntk are you using?

Also, there's no need to define a C.FreeDimension, using C.sequence.input_variable already defines a sequence axis. input_var = cntk.sequence.input_variable(n_channels) # this would do

delzac commented 5 years ago

Anyway, i tested it out. Seems like unpacking a sequence and then slicing it causes a RuntimeError, might be a bug though.

But if you keep to using a C.input_variable((C.FreeDimension, n)) it works fine.

import cntk as C
import numpy as np

n_channels = 3
input_var = C.sequence.input_variable(n_channels)  # RuntimeError: NarrowTo: stride 3 is invalid for interval [0, 1).
# input_var = C.input_variable([C.FreeDimension, n_channels])  # works fine
print(input_var.shape)
unpacked_input = C.sequence.unpack(input_var, padding_value=0, no_mask_output=True)
print(unpacked_input.shape)
sliced = C.slice(unpacked_input, axis=0, begin_index=0, end_index=0, strides=3)
print(sliced.shape)
model = C.to_sequence(sliced)
print(model.shape)

x = np.random.rand(6, n_channels).astype(np.float32)
print(x)
print(sliced.eval({model.arguments[0]: [x]}))
artbataev commented 5 years ago

Thank you very much, @delzac, this code works with CNTK 2.6. I also tried to use it with CNTK 2.4, but it fails.

Unfortunately, I have more complex model (with recurrence), so I have to pack and unpack a sequence, and it fails in CNTK 2.6

Fails

n_channels = 3
input_var = C.input_variable([C.FreeDimension, n_channels])
print(input_var.shape)
packed_input_var = cntk.to_sequence(input_var)
unpacked_input_var = cntk.sequence.unpack(packed_input_var, padding_value=0, no_mask_output=True)
sliced = C.slice(unpacked_input_var, axis=0, begin_index=0, end_index=0, strides=3)
print(sliced.shape)
model = C.to_sequence(sliced)
print(model.shape)

x = np.random.rand(6, n_channels).astype(np.float32)
print(x)
print(sliced.eval({model.arguments[0]: [x]}))

with RuntimeError: NarrowTo: stride 3 is invalid for interval [0, 1).

delzac commented 5 years ago

The error clearly comes from slicing an unpacked sequence. But i'm not sure how to help you from here too. :(

Can you do the slicing while its still a free dimension. Or do you slice them after the recurrence and hence it will always be in the sequence axis?

artbataev commented 5 years ago

I do slice after the recurrence :(

I found the way to do it, but it is very ugly (now I use 1-d convolution with identity matrix and stride=3)

delzac commented 5 years ago

That is ingenious! I learn something today, thanks!

@KeDengMS Do you have a better solution?

delzac commented 5 years ago

@artbataev Hi, i found myself needing to stride on the sequence axis too. Can i check how did you initialise the kernel? I found that the current cntk python api blocks me from initialising through init=my_kernel

artbataev commented 5 years ago

@delzac, for now I found a better solution: use maxpooling, not convolution (since convolution is very slow)

def subsample(input_, subsampling_factor, n_channels):
    output = cntk.sequence.unpack(input_, padding_value=0, no_mask_output=True)
    # output = cntk.expand_dims(output, axis=0) # this doesn't work, possibly a bug in CNTK
    output = cntk.reshape(output, (1, -1, n_channels)) # adding additional dimension
    sliced = cntk.layers.MaxPooling((1, 1, 1), strides=(1, subsampling_factor, 1))(output)
    sliced = cntk.reshape(sliced, (-1, n_channels)) # removing additional dimension
    output = cntk.to_sequence(sliced)
    return output

There are also some strange things about this solution. It seems that reshaping tensor is unnecessary, and code can be simpilfied:

def subsample(input_, subsampling_factor):
   """Be careful: this doesn't work on GPU!"""
    output = cntk.sequence.unpack(input_, padding_value=0, no_mask_output=True)
    sliced = cntk.layers.MaxPooling((1, 1), strides=(subsampling_factor, 1))(output)
    output = cntk.to_sequence(sliced)
    return output

But is works well only on CPU, on GPU there is an error:

RuntimeError: cuDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=... ; expr=cudnnPoolingForward(*m_cudnn, *(m_pool), &C::One, m_inT, ptr(in), &C::Zero, m_outT, ptr(out))

What about convolution, I used this code, which also works, but is significantly slower:

import numpy as np

def subsample(input_, subsampling_factor, n_channels):
    output = cntk.transpose(cntk.sequence.unpack(input_, padding_value=0, no_mask_output=True), perm=[1, 0])
    output = cntk.convolution(
        cntk.Constant(np.eye(n_channels, n_channels, dtype=np.float32).reshape(n_channels, n_channels, 1)),
        output,
        strides=[1, subsampling_factor],
        dilation=(1, 1),
        auto_padding=[False, False],
    )  # out_channels, in_channels, kernel_size
    output = cntk.to_sequence(cntk.transpose(output, perm=[1, 0]))
    return output
delzac commented 5 years ago

@artbataev Thanks for sharing, i managed to work it out too. I used SequentialConvolution to do it.

Your maxpooling approach is a wonderful idea too. But how do you ensure that the pad_values are not included in the stride when you use a sequence.unpack and C.to_sequence earlier?

Anyhow, you can do this to avoid reshape:

C.expand_dims(x, axis=C.Axis.new_leading_axis())
...
C.squeeze()
artbataev commented 5 years ago

@delzac

Your maxpooling approach is a wonderful idea too. But how do you ensure that the pad_values are not included in the stride when you use a sequence.unpack and C.to_sequence earlier?

There is no need to worry about it, because maxpooling (or averagepooling) operation in this case doesn't actually perform any pooling (kernel is [1,1,1], so it takes the element itself, but with stride to take every k element):

sliced = cntk.layers.MaxPooling((1, 1, 1), strides=(1, subsampling_factor, 1))(output)

C.expand_dims(x, axis=C.Axis.new_leading_axis())

this works, thank you!

C.squeeze()

Unfortunately, squeeze doesn't work correctly with tensor after being sequence, so can't use it =(

artbataev commented 5 years ago

If you ask about changes in shape after sequence.unpack, I think there is no better solution except to track manually correct shape of the tensor, and use it with C.to_sequence.

delzac commented 5 years ago

@artbataev got it! Thanks for your inputs :)

delzac commented 5 years ago

I thought of a cleaner implementation for seqeunce.stride. It will work regardless of the number of static axes you have in the sequence.

Just leaving the code here in case anyone else needs it. The master can be found in cntkx in own cntk extension library. Can just do a pip install cntkx to get it.

def stride(x, s: int, tol: float = 0.1):
    p = position(x)
    integers = p / s  # every s sequence item will be an integer
    valid = C.less_equal(C.abs(C.sin(integers * pi)), tol)  # sin of integer multiple of pi will return close to zero
    result = C.sequence.gather(x, valid)
    return result
def position(x, name=''):
    @C.BlockFunction('position', name)
    def inner(a):
        # reconcile_dynamic_axes is necessary to avoid subtle bugs e.g. sequence.where and one_hot
        return C.reconcile_dynamic_axes(C.sequence.where(C.ones_like(Cx.scalar(a))), a)

    return inner(x) # {#, *] [1,]
artbataev commented 5 years ago

@delzac, thank you for the solution, I'll try it! Have you measured speed of this implementation against Convolution / MaxPooling?

delzac commented 5 years ago

@artbataev I tested against sequential convolution and there wasn't any substantial difference!