Assertion error when using attention with LSTM

abhimohta commented 7 years ago

I am trying to implement the attention mechanism on LSTMs. I have two inputs - X and Y. My goal is to encode X and then implement attention on X when encoding Y. Finally, I want to pass the encoded Y through a feed forward network.

For the same, I have implemented the train like this -

    xAxis = Axis.new_unique_dynamic_axis('1')
    yAxis = Axis.new_unique_dynamic_axis('2')
    x = C.sequence.input_variable(1, sequence_axis=xAxis, name="x")
    y = C.sequence.input_variable(1, sequence_axis=yAxis, name="y")
    prediction = model(x, y)
    z = C.input_variable(out_size, dynamic_axes=prediction.dynamic_axes, name="z")

and in the model I have the LSTM function like this -

            @C.Function
            def lstm_with_attention(dh, dc, x):
                h_att = attention_result2output(encoded_input.outputs[0], dh)
                x = C.splice(x, h_att)
                return rec_block(dh, dc, x)
            result = C.layers.Recurrence(lstm_with_attention)(encoded_output)

When I try to run this code - I get this error -

File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages\cntk\ops\functions.py", line 109, in new return Function._to_Function(*args, **kwargs) File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages\cntk\ops\functions.py", line 229, in _to_Function assert out_arg_names == arg_names, (out_arg_names, arg_names) AssertionError: (['dh', 'dc', 'x', 'inp'], ['dh', 'dc', 'x'])

For context, inp is one of the inputs to the model function. I tried debugging through the functions.py code where this error appears but couldn't really get through. Can someone guide me, please?

alexpolozov commented 7 years ago

For reference, here's a full model definition (I work on the same project):

def model(inp, out):
    embedding_dim = 150
    char_embedding = Embedding(embedding_dim)

    with C.layers.default_options(enable_self_stabilization=True):
        # define all the learnable parameters before specifying the @Function
        input_encoder = Sequential([
            char_embedding,
            Stabilizer(),
            For(range(args.num_layers-1), lambda: Recurrence(LSTM(args.hidden_size))),
            Recurrence(LSTM(args.hidden_size), return_full_state=True),
            (Label('encoded_input_h'), Label('encoded_input_c'))
        ])
        stab_out = Stabilizer(name='stab_out')
        stab_result = Stabilizer(name='stab_result')
        out_blocks = [LSTM(args.hidden_size) for j in range(args.num_layers)]
        result_projection = Dense(out_size, name='result_projection')
        attention_result2output = AttentionModel(args.attention_size,
                                                 name='attention_result2out')

        encoded_input = input_encoder(inp)
        encoded_output = (char_embedding >> stab_out)(out)
        for j in range(args.num_layers):
            rec_block = out_blocks[j]
            @C.Function
            def lstm_with_attention(dh, dc, x):
                h_att = attention_result2output(encoded_input.outputs[0], dh)
                x = C.splice(x, h_att)
                return rec_block(dh, dc, x)
            result = C.layers.Recurrence(lstm_with_attention)(encoded_output)
        result = (stab_result >> result_projection)(result)
        return result

Also, inp presumably comes from the global inputs to the network:

inputAxis = Axis.new_unique_dynamic_axis('1')
outputAxis = Axis.new_unique_dynamic_axis('2')
x = C.sequence.input_variable(1, sequence_axis=inputAxis, name="inp")
out = C.sequence.input_variable(1, sequence_axis=outputAxis, name="out")
prediction = model(x, out)
golden = C.input_variable(out_size, dynamic_axes=prediction.dynamic_axes, name="y")

However, we're having trouble understanding why inp is still considered a "parameter" of lstm_with_attention.

ke1337 commented 7 years ago

This is a limitation in function argument search. Reading from functions.py, it shows that function arguments would be hidden for inner function when it's defined in outer function, for the nested case. Otherwise, it would use function.arguments to find all inputs, and thus inp becomes part of it.

Given that, your code could be fixed by wrapping another C.Function outside of the lstm_with_attention definition. In the code below, enc_input0 is defined as argument of out_func, so it would not be counted in arguments of lstm_with_attention.

    @C.Function
    def out_func(enc_input0, enc_output):
        for j in range(args.num_layers):
            rec_block = out_blocks[j]
            @C.Function
            def lstm_with_attention(dh, dc, xx):
                h_att = attention_result2output(enc_input0, dh)
                xx = C.splice(xx, h_att)
                return rec_block(dh, dc, xx)
            result = C.layers.Recurrence(lstm_with_attention)(enc_output)
        return result

    result = out_func(encoded_input.outputs[0], encoded_output)
    result = (stab_result >> result_projection)(result)
    return result

alexpolozov commented 7 years ago

This works, thanks! For anyone reading this issue in the future, there were a couple more bugs in the code that triggered follow-up errors. The final (fixed) version of this model looks like this:

def model(inp, out):
    embedding_dim = 150
    char_embedding = Embedding(embedding_dim)

    with C.layers.default_options(enable_self_stabilization=True):
        # define all the learnable parameters before specifying the @Function
        input_encoder = Sequential([
            char_embedding,
            Stabilizer(),
            For(range(args.num_layers - 1), lambda: Recurrence(LSTM(args.hidden_size))),
            Recurrence(LSTM(args.hidden_size), return_full_state=True),
            (Label('encoded_input_h'), Label('encoded_input_c'))
        ])
        stab_out = Stabilizer(name='stab_out')
        stab_result = Stabilizer(name='stab_result')
        out_blocks = [LSTM(args.hidden_size) for j in range(args.num_layers)]
        result_projection = Dense(out_size, name='result_projection')
        attention_output2input = AttentionModel(args.attention_size, name='attention_out2in')

        @C.Function
        def encode_output(enc_inp, out):
            result = (char_embedding >> stab_out)(out)
            for j in range(args.num_layers):
                rec_block = out_blocks[j]

                @C.Function
                def lstm_with_attention(dh, dc, x):
                    h_att = attention_output2input(enc_inp, dh)
                    x = C.splice(x, h_att)
                    return rec_block(dh, dc, x)

                recurrence = C.layers.Recurrence if j < args.num_layers - 1 else C.layers.Fold
                result = recurrence(lstm_with_attention)(result)
            return result

        encoded_input = input_encoder(inp)
        encoded_output = encode_output(encoded_input.outputs[0], out)
        result = (stab_result >> result_projection)(encoded_output)
        return result

tomasek commented 6 years ago

Hello,

I am trying to implement this model, but when I try to create the cross_entropy_with_softmax() with it, I get the error:

RuntimeError: Operation 'TransposeTimes': Operand 'Placeholder('result_projection', [#], [77057])' has dynamic axes, that do not match the dynamic axes '[#, labelAxis]' of the other operands.

Can you help me?

microsoft / CNTK

Assertion error when using attention with LSTM #2487