Open abhimohta opened 7 years ago
For reference, here's a full model definition (I work on the same project):
def model(inp, out):
embedding_dim = 150
char_embedding = Embedding(embedding_dim)
with C.layers.default_options(enable_self_stabilization=True):
# define all the learnable parameters before specifying the @Function
input_encoder = Sequential([
char_embedding,
Stabilizer(),
For(range(args.num_layers-1), lambda: Recurrence(LSTM(args.hidden_size))),
Recurrence(LSTM(args.hidden_size), return_full_state=True),
(Label('encoded_input_h'), Label('encoded_input_c'))
])
stab_out = Stabilizer(name='stab_out')
stab_result = Stabilizer(name='stab_result')
out_blocks = [LSTM(args.hidden_size) for j in range(args.num_layers)]
result_projection = Dense(out_size, name='result_projection')
attention_result2output = AttentionModel(args.attention_size,
name='attention_result2out')
encoded_input = input_encoder(inp)
encoded_output = (char_embedding >> stab_out)(out)
for j in range(args.num_layers):
rec_block = out_blocks[j]
@C.Function
def lstm_with_attention(dh, dc, x):
h_att = attention_result2output(encoded_input.outputs[0], dh)
x = C.splice(x, h_att)
return rec_block(dh, dc, x)
result = C.layers.Recurrence(lstm_with_attention)(encoded_output)
result = (stab_result >> result_projection)(result)
return result
Also, inp
presumably comes from the global inputs to the network:
inputAxis = Axis.new_unique_dynamic_axis('1')
outputAxis = Axis.new_unique_dynamic_axis('2')
x = C.sequence.input_variable(1, sequence_axis=inputAxis, name="inp")
out = C.sequence.input_variable(1, sequence_axis=outputAxis, name="out")
prediction = model(x, out)
golden = C.input_variable(out_size, dynamic_axes=prediction.dynamic_axes, name="y")
However, we're having trouble understanding why inp
is still considered a "parameter" of lstm_with_attention
.
This is a limitation in function argument search. Reading from functions.py, it shows that function arguments would be hidden for inner function when it's defined in outer function, for the nested case. Otherwise, it would use function.arguments to find all inputs, and thus inp becomes part of it.
Given that, your code could be fixed by wrapping another C.Function outside of the lstm_with_attention definition. In the code below, enc_input0 is defined as argument of out_func, so it would not be counted in arguments of lstm_with_attention.
@C.Function
def out_func(enc_input0, enc_output):
for j in range(args.num_layers):
rec_block = out_blocks[j]
@C.Function
def lstm_with_attention(dh, dc, xx):
h_att = attention_result2output(enc_input0, dh)
xx = C.splice(xx, h_att)
return rec_block(dh, dc, xx)
result = C.layers.Recurrence(lstm_with_attention)(enc_output)
return result
result = out_func(encoded_input.outputs[0], encoded_output)
result = (stab_result >> result_projection)(result)
return result
This works, thanks! For anyone reading this issue in the future, there were a couple more bugs in the code that triggered follow-up errors. The final (fixed) version of this model looks like this:
def model(inp, out):
embedding_dim = 150
char_embedding = Embedding(embedding_dim)
with C.layers.default_options(enable_self_stabilization=True):
# define all the learnable parameters before specifying the @Function
input_encoder = Sequential([
char_embedding,
Stabilizer(),
For(range(args.num_layers - 1), lambda: Recurrence(LSTM(args.hidden_size))),
Recurrence(LSTM(args.hidden_size), return_full_state=True),
(Label('encoded_input_h'), Label('encoded_input_c'))
])
stab_out = Stabilizer(name='stab_out')
stab_result = Stabilizer(name='stab_result')
out_blocks = [LSTM(args.hidden_size) for j in range(args.num_layers)]
result_projection = Dense(out_size, name='result_projection')
attention_output2input = AttentionModel(args.attention_size, name='attention_out2in')
@C.Function
def encode_output(enc_inp, out):
result = (char_embedding >> stab_out)(out)
for j in range(args.num_layers):
rec_block = out_blocks[j]
@C.Function
def lstm_with_attention(dh, dc, x):
h_att = attention_output2input(enc_inp, dh)
x = C.splice(x, h_att)
return rec_block(dh, dc, x)
recurrence = C.layers.Recurrence if j < args.num_layers - 1 else C.layers.Fold
result = recurrence(lstm_with_attention)(result)
return result
encoded_input = input_encoder(inp)
encoded_output = encode_output(encoded_input.outputs[0], out)
result = (stab_result >> result_projection)(encoded_output)
return result
Hello,
I am trying to implement this model, but when I try to create the cross_entropy_with_softmax()
with it, I get the error:
RuntimeError: Operation 'TransposeTimes': Operand 'Placeholder('result_projection', [#], [77057])' has dynamic axes, that do not match the dynamic axes '[#, labelAxis]' of the other operands.
Can you help me?
I am trying to implement the attention mechanism on LSTMs. I have two inputs - X and Y. My goal is to encode X and then implement attention on X when encoding Y. Finally, I want to pass the encoded Y through a feed forward network.
For the same, I have implemented the train like this -
and in the model I have the LSTM function like this -
When I try to run this code - I get this error -
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages\cntk\ops\functions.py", line 109, in new return Function._to_Function(*args, **kwargs) File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages\cntk\ops\functions.py", line 229, in _to_Function assert out_arg_names == arg_names, (out_arg_names, arg_names) AssertionError: (['dh', 'dc', 'x', 'inp'], ['dh', 'dc', 'x'])
For context, inp is one of the inputs to the model function. I tried debugging through the functions.py code where this error appears but couldn't really get through. Can someone guide me, please?