请问这个代码在测试集上的准确率和召回率分别是多少呢

indiejoseph / cnn-text-classification-tf-chinese

CNN for Chinese Text Classification in Tensorflow

235 stars 108 forks source link

请问这个代码在测试集上的准确率和召回率分别是多少呢 #10

Open butterluo opened 7 years ago

butterluo commented 7 years ago

如题，谢谢！

indiejoseph commented 7 years ago

准确率是 98% 召回率我忙記了，好像是92%左右, 但因dataset關係，只從當中取出300個样本, 不知能否作準

paulcx commented 7 years ago

@indiejoseph 请问如果要做多分类，应该怎么改？

indiejoseph commented 7 years ago

Multi-labels? 即是每次輸出多於一個 class? 現在我選用了 argmax 令 output 只選最高 prob 那一個 class, 但要做 multi-labels 就要去除 argmax 改用 non-linear 作 output, 用 tf.nn.sigmoid_cross_entropy_with_logits 計算 loss

paulcx commented 7 years ago

@indiejoseph 谢谢，另外怎么看目前的 CNN-RNN 文本分类，在文本分类上会有大幅度提高吗？

indiejoseph commented 7 years ago

CNN 比 RNN 快，不用等行完所有 steps 就可以計 loss, 但 CNN 句子長度不可改變。 RNN 抽取句子順序特徵比較好，句子長度沒有限制，bi-rnn 更加成為主流。已有 paper 証實 RNN 比 CNN 好，用 CNN-RNN 未必帶來太大提升, CNN 會作為 n-grams 特徵層輸出到 RNN 把準確度拉高了 2% 左右，但犧牲了速度...

C-LSTM 比 Bi-LSTM 高 1.4% 可參考: https://arxiv.org/pdf/1511.08630.pdf

paulcx commented 7 years ago

@indiejoseph 正在试着改Multi-labels多分类，是把这两部分改成上文提到的对吗？ self.predictions = tf.argmax(self.scores, 1, name="predictions") losses = tf.nn.softmax_cross_entropy_with_logits(self.scores, self.input_y)

indiejoseph commented 7 years ago

對主要是把 softmax_cross_entropy_with_logits 改做 sigmoid_cross_entropy_with_logits，還有 predictions 由 tf.argmax 改做 sigmoid

paulcx commented 7 years ago

@indiejoseph 谢谢，建议迁移到tf 1.0上，花了不少功夫还是有些兼容问题没调通。