richliao / textClassifier

Text classifier for Hierarchical Attention Networks for Document Classification
Apache License 2.0
1.07k stars 379 forks source link

The performance is worse #40

Open onlyff opened 5 years ago

onlyff commented 5 years ago

Hi, I test the code on the 'labeledTrainData.tsv' dataset, 80% of the dataset is the testing data, and 20% of the dataset is the validation data. I use theano backend. However, the performance is worse. The results are as follows.


Train on 20000 samples, validate on 5000 samples Epoch 1/10 20000/20000 [==============================] - 104s - loss: 0.6974 - acc: 0.5082 - val_loss: 0.6938 - val_acc: 0.5058 Epoch 2/10 20000/20000 [==============================] - 103s - loss: 0.6982 - acc: 0.5025 - val_loss: 0.6930 - val_acc: 0.5124 Epoch 3/10 20000/20000 [==============================] - 104s - loss: 0.6959 - acc: 0.5128 - val_loss: 0.6950 - val_acc: 0.5058 Epoch 4/10 20000/20000 [==============================] - 104s - loss: 0.6978 - acc: 0.4936 - val_loss: 0.6939 - val_acc: 0.4942 Epoch 5/10 20000/20000 [==============================] - 103s - loss: 0.6983 - acc: 0.4958 - val_loss: 0.6934 - val_acc: 0.4954 Epoch 6/10 20000/20000 [==============================] - 103s - loss: 0.6994 - acc: 0.5002 - val_loss: 0.7012 - val_acc: 0.4944 Epoch 7/10 20000/20000 [==============================] - 104s - loss: 0.6992 - acc: 0.4973 - val_loss: 0.6931 - val_acc: 0.5054 Epoch 8/10 20000/20000 [==============================] - 103s - loss: 0.6977 - acc: 0.5032 - val_loss: 0.6931 - val_acc: 0.4940 Epoch 9/10 20000/20000 [==============================] - 103s - loss: 0.6966 - acc: 0.5070 - val_loss: 0.6937 - val_acc: 0.4942 Epoch 10/10 20000/20000 [==============================] - 103s - loss: 0.6961 - acc: 0.5068 - val_loss: 0.7287 - val_acc: 0.4942

The best performance is pretty much still cap at 90.4%s that is reported in your website https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-HATN/.

I wonder know that whether the attention layer code in the following is right for the theano backend. The implentation of the attention layer is as follows:

class AttLayer(Layer): def init(self, attention_dim): self.init = initializers.get('normal') self.supports_masking = True self.attention_dim = attention_dim super(AttLayer, self).init()

def build(self, input_shape):
    assert len(input_shape) == 3
    self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
    self.b = K.variable(self.init((self.attention_dim, )))
    self.u = K.variable(self.init((self.attention_dim, 1)))
    self.trainable_weights = [self.W, self.b, self.u]
    super(AttLayer, self).build(input_shape)

def compute_mask(self, inputs, mask=None):
    return mask

def call(self, x, mask=None):
    # size of x :[batch_size, sel_len, attention_dim]
    # size of u :[batch_size, attention_dim]
    # uit = tanh(xW+b)
    uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
    ait = K.dot(uit, self.u)
    ait = K.squeeze(ait, -1)

    ait = K.exp(ait)

    if mask is not None:
        # Cast the mask to floatX to avoid float64 upcasting in theano
        ait *= K.cast(mask, K.floatx())
    ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
    ait = K.expand_dims(ait)
    weighted_input = x * ait
    output = K.sum(weighted_input, axis=1)

    return output

def compute_output_shape(self, input_shape):
    return (input_shape[0], input_shape[-1])

Thanks a lot