yongzhuo / Keras-TextClassification

中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
https://blog.csdn.net/rensihui
MIT License
1.77k stars 406 forks source link

您好,依据DCNN论文词向量没一维都会有CONV作用,可以请教一下CONV1D是如何作用在词向量的每一维上的吗? #36

Closed 473951841 closed 4 years ago

yongzhuo commented 4 years ago

不是很明白你想表达的意思,宽卷积显然针对的是每一个字或词,所以卷积的是embed_size(如300);fold那里针对每个word-embedding的相邻维度做了个平均,但没什么效果,词向量的单独一个维度应该没什么意义吧(我觉得是这样子)。

473951841 commented 4 years ago

是这样的,假设词向量embed_size=300维,首先可以问下最开始处理的embedding output的shape大致是什么样的吗,然后我问的这个fold层问题其实是和之前wide_conv相关的,我在参考您的model相关代码编自己的程序,文章中conv是针对300维的每一维进行操作,我的embedding output输入是(batch_size, sentence_length, 300),但是经过wide_conv(核心操作是keras的Conv1D)后就变成(batch_size, 宽卷积处理后的sentence_length, filter_num)而300维不见了,所以导致后面fold是依据filter_num来fold的,所以这里想请教一下程序中如何体现出300维的,谢谢您。然后还有一个问题是动态k的选择中句长好像是固定句长最大值,这个是不是每句都不一样?这样在程序中应该如何处理呢,谢谢

yongzhuo commented 4 years ago
  1. embedding output的shape是(None, 128, 768)
  2. 卷积核是(filter_size(如3,4,5), embed_size(这里就是768)),fold那里其实就是处理filter_num, 你可以想象一下conv+pool+conv+pool这样的网络,计算机视觉那些出名的网络。
  3. 动态池化dynamic_k_max_pooling主要用的是tf.nn.top_k,固定tok就好,与文本长度无关。

具体网络架构你可以看summary输出的结果,如下例子:

Model: "model_1"


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 128)] 0


embedding (Embedding) (None, 128, 768) 585216 input_1[0][0]


wide_convolution_0 (wide_convolution) (None, 137, 300) 0 embedding[0][0]


wide_convolution_1 (wide_convolution) (None, 133, 300) 0 embedding[0][0]


dynamic_k_max_pooling (dynamic_k_max_pooli (None, 85, 300) 0 wide_convolution_0[0][0]


dynamic_k_max_pooling_3 (dynamic_k_max_poo (None, 85, 300) 0 wide_convolution_1[0][0]


wide_convolution_0_0 (wide_convolution) (None, 91, 300) 0 dynamic_k_max_pooling[0][0]


wide_convolution_1_1 (wide_convolution) (None, 88, 300) 0 dynamic_k_max_pooling_3[0][0]


dynamic_k_max_pooling_1 (dynamic_k_max_poo (None, 42, 300) 0 wide_convolution_0_0[0][0]


dynamic_k_max_pooling_4 (dynamic_k_max_poo (None, 42, 300) 0 wide_convolution_1_1[0][0]


wide_convolution_0_0_0 (wide_convolution) (None, 46, 300) 0 dynamic_k_max_pooling_1[0][0]


wide_convolution_1_1_1 (wide_convolution) (None, 44, 300) 0 dynamic_k_max_pooling_4[0][0]


prem_fold (prem_fold) (None, 46, 150) 0 wide_convolution_0_0_0[0][0]


prem_fold_1 (prem_fold) (None, 44, 150) 0 wide_convolution_1_1_1[0][0]


dynamic_k_max_pooling_2 (dynamic_k_max_poo (None, 3, 150) 0 prem_fold[0][0]


dynamic_k_max_pooling_5 (dynamic_k_max_poo (None, 3, 150) 0 prem_fold_1[0][0]


concatenate (Concatenate) (None, 6, 150) 0 dynamic_k_max_pooling_2[0][0]
dynamic_k_max_pooling_5[0][0]


dropout (Dropout) (None, 6, 150) 0 concatenate[0][0]


flatten (Flatten) (None, 900) 0 dropout[0][0]


dense (Dense) (None, 17) 15317 flatten[0][0]

Total params: 600,533 Trainable params: 600,533 Non-trainable params: 0

473951841 commented 4 years ago

您好,我看了下那篇文章,认为folding还是对维度进行的操作,在原始论文的图三中看到folding输入前本来是4维,变成了2维,所以这里在模型中如何体现还不太清楚,然后论文中,决定dynamickmaxpooling 的k 包括一个公式max(s(L-l)/L,ktop),其中s便是输入句长,在您的模型中看得出应该是 select_k函数决定k值,然而s已经固定为len_max,定义在hyper_parameter中,这里是否和原文想表述的不一呢,我的感觉是输入的每个样本句长在进行kmaxpooling时不能先pad成同一长度,这个如何实现还不太清楚。(可以的话希望能加QQ细聊,我的QQ是我的昵称,非常感谢)

yongzhuo commented 4 years ago

上图中(batch, seq_len, embed_size)卷积后尺寸变成了(batch, seq_len[wide], filters_num),这里的filters_num已经不是你认为的词向量维度了。 你可以这样,二维卷积,卷积核为(filter_size, 1)而不是(filter_size, embed_size),这样子得到的embed_size就会不变,结果会增加一维的filters_num。 你可以在layers文件中修改class为wide_convolution的Layer, 示例如下:

### DCNN ################################
class wide_convolution(L.Layer):
    """
        paper: http://www.aclweb.org/anthology/P14-1062
        paper title: "A Convolutional Neural Network for Modelling Sentences"
        宽卷积, 如果s表示句子最大长度, m为卷积核尺寸,
           则宽卷积输出为 s + m − 1,
           普通卷积输出为 s - m + 1.
        github keras实现可以参考: https://github.com/AlexYangLi/TextClassification/blob/master/models/keras_dcnn_model.py
    """
    def __init__(self, filter_num=300, filter_size=3, **kwargs):
        self.filter_size = filter_size
        self.filter_num = filter_num
        super().__init__(**kwargs)

    def build(self, input_shape):
        super().build(input_shape)

    def call(self, inputs):
        # x_input_pad = L.ZeroPadding1D((self.filter_size-1, self.filter_size-1))(inputs)
        # conv_1d = L.Conv1D(filters=self.filter_num,
        #                  kernel_size=self.filter_size,
        #                  strides=1,
        #                  padding="VALID",
        #                  kernel_initializer="normal", # )(x_input_pad)
        #                  activation="tanh")(x_input_pad)
        x_input_pad = L.ZeroPadding1D((self.filter_size - 1, self.filter_size - 1))(inputs)
        k_shape = K.int_shape(x_input_pad)
        inputs_reshape = L.Reshape((k_shape[1], k_shape[2], 1))(x_input_pad)
        conv_2d = L.Conv2D(filters=1,
                           kernel_size=(self.filter_size, 1),
                           padding='valid',
                           kernel_initializer='normal',
                           activation='tanh',
                           )(inputs_reshape)
        k_shape = K.int_shape(conv_2d)
        conv_1d = L.Reshape(target_shape=(k_shape[1], k_shape[2]))(conv_2d)
        return conv_1d

    def compute_output_shape(self, input_shape):
        return input_shape[0], input_shape[1] + self.filter_size - 1, input_shape[-1]

    def get_config(self):
        config = {"filter_size":self.filter_size,
                  "filter_num": self.filter_num}
        base_config = super(wide_convolution, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

模型结构为:

Model: "model_1"
____________________________________________________________________________________________________________________________________
Layer (type)                               Output Shape                 Param #         Connected to                                
====================================================================================================================================
input_1 (InputLayer)                       [(None, 128)]                0                                                           
____________________________________________________________________________________________________________________________________
embedding (Embedding)                      (None, 128, 768)             553728          input_1[0][0]                               
____________________________________________________________________________________________________________________________________
wide_convolution_0 (wide_convolution)      (None, 137, 768)             0               embedding[0][0]                             
____________________________________________________________________________________________________________________________________
wide_convolution_1 (wide_convolution)      (None, 133, 768)             0               embedding[0][0]                             
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling (dynamic_k_max_pooli (None, 85, 768)              0               wide_convolution_0[0][0]                    
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_3 (dynamic_k_max_poo (None, 85, 768)              0               wide_convolution_1[0][0]                    
____________________________________________________________________________________________________________________________________
wide_convolution_0_0 (wide_convolution)    (None, 91, 768)              0               dynamic_k_max_pooling[0][0]                 
____________________________________________________________________________________________________________________________________
wide_convolution_1_1 (wide_convolution)    (None, 88, 768)              0               dynamic_k_max_pooling_3[0][0]               
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_1 (dynamic_k_max_poo (None, 42, 768)              0               wide_convolution_0_0[0][0]                  
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_4 (dynamic_k_max_poo (None, 42, 768)              0               wide_convolution_1_1[0][0]                  
____________________________________________________________________________________________________________________________________
wide_convolution_0_0_0 (wide_convolution)  (None, 46, 768)              0               dynamic_k_max_pooling_1[0][0]               
____________________________________________________________________________________________________________________________________
wide_convolution_1_1_1 (wide_convolution)  (None, 44, 768)              0               dynamic_k_max_pooling_4[0][0]               
____________________________________________________________________________________________________________________________________
prem_fold (prem_fold)                      (None, 46, 384)              0               wide_convolution_0_0_0[0][0]                
____________________________________________________________________________________________________________________________________
prem_fold_1 (prem_fold)                    (None, 44, 384)              0               wide_convolution_1_1_1[0][0]                
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_2 (dynamic_k_max_poo (None, 3, 384)               0               prem_fold[0][0]                             
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_5 (dynamic_k_max_poo (None, 3, 384)               0               prem_fold_1[0][0]                           
____________________________________________________________________________________________________________________________________
concatenate (Concatenate)                  (None, 6, 384)               0               dynamic_k_max_pooling_2[0][0]               
                                                                                        dynamic_k_max_pooling_5[0][0]               
____________________________________________________________________________________________________________________________________
dropout (Dropout)                          (None, 6, 384)               0               concatenate[0][0]                           
____________________________________________________________________________________________________________________________________
flatten (Flatten)                          (None, 2304)                 0               dropout[0][0]                               
____________________________________________________________________________________________________________________________________
dense (Dense)                              (None, 17)                   39185           flatten[0][0]                               
====================================================================================================================================
Total params: 592,913
Trainable params: 592,913
Non-trainable params: 0
____________________________________________________________________________________________________________________________________