therne / dmn-tensorflow

Dynamic Memory Networks (https://arxiv.org/abs/1603.01417) in Tensorflow
240 stars 86 forks source link

softmax in def attention() is wrong? #6

Open daxiongshu opened 7 years ago

daxiongshu commented 7 years ago

Hi, first of all, thank you very much for the code, I learnt a lot!

In models/new/episode_module.py, line 59, l2 = tf.nn.softmax(l2) is performed on batch/N dimension but in paper https://arxiv.org/pdf/1603.01417v1.pdf formula (10), the softmax should be performed on Fact/F dimension. Is it the case?

Please let me know. Thank you!

Best regards,

daxiongshu commented 7 years ago

A fix could be comment out line 59, l2 = tf.nn.softmax(l2) and revise the new() as follows

def new(self, memory):
        """ Creates new episode vector (will feed into Episodic Memory GRU)
        :param memory: Previous memory vector
        :return: episode vector
        """
        state = self.init_state
        memory = tf.transpose(memory)  # [N, D]
        with tf.variable_scope('AttnGate') as scope:
            gs = []
            for f, f_t in zip(self.facts, self.facts_transposed):
                g = self.attention(f, memory)
                gs.append(g)
            gs = tf.pack(gs)
            gs = tf.nn.softmax(gs, dim=0)
            gs = tf.unpack(gs)
        with tf.variable_scope('AttnGate_update') as scope:
            for f, f_t, g in zip(self.facts, self.facts_transposed, gs):
                state = self.gru(f_t, state, g)
                scope.reuse_variables()  # share params
        return state
shamanez commented 7 years ago

Yes it is wrong. The paper says we should get the normalize distribution over attentions at each outputs from the fusion layer .