sayondutta / text_summarizer

Applied attention on Sequence to Sequence model for the task of Text Summarizer using Tensorflow's raw_rnn. here, I haven't used Tensorflow's inbuilt seq2seq function. The reason behind is to apply attention mechanism manually.
7 stars 6 forks source link

ask for help #1

Closed JuniorPan closed 7 years ago

JuniorPan commented 7 years ago

i want to know how to use multi layers by your code

0b01 commented 7 years ago

Hi @LagrangePan I was wondering the same thing. I have filed a bug to the main tensorflow repository:

https://github.com/tensorflow/tensorflow/issues/10862

0b01 commented 7 years ago

Hi I did some further research and found https://github.com/suriyadeepan/augmented_seq2seq/blob/01706d3869a42f3cf0bfe5c83f069646315a945e/bi_encoder.py

He uses tf.scan to manually generate a state tuple. However, there is no way (yet) to get encoder_final_outputs: https://github.com/suriyadeepan/augmented_seq2seq/issues/1

0b01 commented 7 years ago

Just figure it out:

See code below. However, stacked LSTM may not work just yet.


    enc_cells_fw = []
    for i in range(0, encoder_depth):
        with tf.variable_scope('enc_RNN_{}'.format(i)):
            cell = tf.contrib.rnn.LSTMCell(hidden_dim)  # Or LSTMCell(hidden_dim)
            cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=1.0-dropout)
            enc_cells_fw.append(cell)
    enc_cell_fw = tf.contrib.rnn.MultiRNNCell(enc_cells_fw, state_is_tuple=True)
    enc_cells_bw = []
    for i in range(0, encoder_depth):
        with tf.variable_scope('enc_RNN_{}'.format(i)):
            cell = tf.contrib.rnn.LSTMCell(hidden_dim)  # Or LSTMCell(hidden_dim)
            cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=1.0-dropout)
            enc_cells_bw.append(cell)
    enc_cell_bw = tf.contrib.rnn.MultiRNNCell(enc_cells_bw, state_is_tuple=True)

    init_state = enc_cell_fw.zero_state(batch_size=batch_size, dtype=tf.float32)

    # transpose encoder inputs to time-major
    enc_inp_t = tf.transpose(enc_inp, [1,0,2])
    #
    # der bi encoder
    with tf.variable_scope('encoder-fw') as scope: # forward sequence
        enc_output_fw, enc_states_fw = tf.scan(lambda (_, st_1), x : enc_cell_fw(x, st_1),
                enc_inp_t, initializer=(tf.zeros(shape=[batch_size, hidden_dim]), init_state))

    with tf.variable_scope('encoder-bw') as scope: # backward sequence
        enc_output_bw, enc_states_bw = tf.scan(lambda (_, st_1), x : enc_cell_bw(x, st_1),
                            tf.reverse(enc_inp_t, axis=[0]), # <- reverse inputs
                            initializer=(tf.zeros(shape=[batch_size, hidden_dim]), init_state))

    enc_output_fw = tf.transpose(enc_output_fw, [1,0,2])
    enc_output_bw = tf.transpose(enc_output_bw, [1,0,2])
    encoder_outputs = tf.concat([enc_output_fw, enc_output_bw], 2)

    # project context
    Wc = tf.get_variable('Wc', shape=[2, encoder_depth, hidden_dim*2, hidden_dim*2],
                        initializer=tf.contrib.layers.xavier_initializer())

    # extract context [get final state; project c,h to [hidden_dim]; list->tuple]
    encoder_final_state = []
    for layer in range(encoder_depth):
        enc_c = tf.concat( (enc_states_fw[layer].c[-1], enc_states_bw[layer].c[-1]), 1)
        enc_c = tf.matmul(enc_c, Wc[0][layer])
        enc_h = tf.concat( (enc_states_fw[layer].h[-1], enc_states_bw[layer].h[-1]), 1)
        enc_h = tf.matmul(enc_h, Wc[1][layer])
        encoder_final_state.append(tf.contrib.rnn.LSTMStateTuple(c = enc_c, h = enc_h))
    # convert list to tuple - eww!
    encoder_final_state = tuple(encoder_final_state)
JuniorPan commented 7 years ago

thank you for your help

2017-06-22 4:38 GMT+08:00 Ricky Han notifications@github.com:

Just figure it out:

See code below. However, stacked LSTM may not work just yet.

enc_cells_fw = []
for i in range(0, encoder_depth):
    with tf.variable_scope('enc_RNN_{}'.format(i)):
        cell = tf.contrib.rnn.LSTMCell(hidden_dim)  # Or LSTMCell(hidden_dim)
        cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=1.0-dropout)
        enc_cells_fw.append(cell)
enc_cell_fw = tf.contrib.rnn.MultiRNNCell(enc_cells_fw, state_is_tuple=True)
enc_cells_bw = []
for i in range(0, encoder_depth):
    with tf.variable_scope('enc_RNN_{}'.format(i)):
        cell = tf.contrib.rnn.LSTMCell(hidden_dim)  # Or LSTMCell(hidden_dim)
        cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=1.0-dropout)
        enc_cells_bw.append(cell)
enc_cell_bw = tf.contrib.rnn.MultiRNNCell(enc_cells_bw, state_is_tuple=True)

init_state = enc_cell_fw.zero_state(batch_size=batch_size, dtype=tf.float32)

# transpose encoder inputs to time-major
enc_inp_t = tf.transpose(enc_inp, [1,0,2])
#
# der bi encoder
with tf.variable_scope('encoder-fw') as scope: # forward sequence
    enc_output_fw, enc_states_fw = tf.scan(lambda (_, st_1), x : enc_cell_fw(x, st_1),
            enc_inp_t, initializer=(tf.zeros(shape=[batch_size, hidden_dim]), init_state))

with tf.variable_scope('encoder-bw') as scope: # backward sequence
    enc_output_bw, enc_states_bw = tf.scan(lambda (_, st_1), x : enc_cell_bw(x, st_1),
                        tf.reverse(enc_inp_t, axis=[0]), # <- reverse inputs
                        initializer=(tf.zeros(shape=[batch_size, hidden_dim]), init_state))

enc_output_fw = tf.transpose(enc_output_fw, [1,0,2])
enc_output_bw = tf.transpose(enc_output_bw, [1,0,2])
encoder_outputs = tf.concat([enc_output_fw, enc_output_bw], 2)

# project context
Wc = tf.get_variable('Wc', shape=[2, encoder_depth, hidden_dim*2, hidden_dim*2],
                    initializer=tf.contrib.layers.xavier_initializer())

# extract context [get final state; project c,h to [hidden_dim]; list->tuple]
encoder_final_state = []
for layer in range(encoder_depth):
    enc_c = tf.concat( (enc_states_fw[layer].c[-1], enc_states_bw[layer].c[-1]), 1)
    enc_c = tf.matmul(enc_c, Wc[0][layer])
    enc_h = tf.concat( (enc_states_fw[layer].h[-1], enc_states_bw[layer].h[-1]), 1)
    enc_h = tf.matmul(enc_h, Wc[1][layer])
    encoder_final_state.append(tf.contrib.rnn.LSTMStateTuple(c = enc_c, h = enc_h))
# convert list to tuple - eww!
encoder_final_state = tuple(encoder_final_state)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sayondutta/text_summarizer/issues/1#issuecomment-310198077, or mute the thread https://github.com/notifications/unsubscribe-auth/ATtQj7A6v4ZUOu4V0Tz9ix-MKnxxW70Cks5sGX9OgaJpZM4N6AAZ .

JuniorPan commented 7 years ago

I have some confuse in the loop_fn function, def loop_fn_transition(time,previous_output,previous_state,previous_loop_state):

''''yourcode'''''

state = previous_state      # why this just return previous_state
output = previous_output   # and this too???
#print output.shape
loop_state = None
return (elements_finished,
        next_input,
        state,
        output,
        loop_state)
sayondutta commented 7 years ago

for all the steps or only for initial step ?

JuniorPan commented 7 years ago

all the steps

SAYON DUTTA notifications@github.com于2017年6月23日 周五15:12写道:

for all the steps or only for initial step ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sayondutta/text_summarizer/issues/1#issuecomment-310590363, or mute the thread https://github.com/notifications/unsubscribe-auth/ATtQj5BIQGM_LA8rN0P0JY56xqEAGKFzks5sG2U-gaJpZM4N6AAZ .

sayondutta commented 7 years ago

Let me recheck it ᐧ

On Fri, Jun 23, 2017 at 2:51 PM, LagrangePan notifications@github.com wrote:

all the steps

SAYON DUTTA notifications@github.com于2017年6月23日 周五15:12写道:

for all the steps or only for initial step ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sayondutta/text_summarizer/issues/1#issuecomment- 310590363, or mute the thread https://github.com/notifications/unsubscribe-auth/ATtQj5BIQGM_ LA8rN0P0JY56xqEAGKFzks5sG2U-gaJpZM4N6AAZ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sayondutta/text_summarizer/issues/1#issuecomment-310617385, or mute the thread https://github.com/notifications/unsubscribe-auth/ABIhOvDJYEn72HNypymJSMYvqIfk_Zdwks5sG4OggaJpZM4N6AAZ .

-- Regards and Thanks

Sayon Dutta Cofounder & VP - AI Research, Marax AI *B.Tech. (Hons), IIT Kharagpur mob. no.: +917007052294 , *+917406608296

sayondutta commented 7 years ago

actually these two will give the same results which is the state(here output inside the function is not the actual output but state), for actual output calculation you can see after raw_rnn is implemented for all the decoder steps the attention is applied again. Inside RNN attention is applied only to obtain next input.