Closed 473951841 closed 4 years ago
是这样的,假设词向量embed_size=300维,首先可以问下最开始处理的embedding output的shape大致是什么样的吗,然后我问的这个fold层问题其实是和之前wide_conv相关的,我在参考您的model相关代码编自己的程序,文章中conv是针对300维的每一维进行操作,我的embedding output输入是(batch_size, sentence_length, 300),但是经过wide_conv(核心操作是keras的Conv1D)后就变成(batch_size, 宽卷积处理后的sentence_length, filter_num)而300维不见了,所以导致后面fold是依据filter_num来fold的,所以这里想请教一下程序中如何体现出300维的,谢谢您。然后还有一个问题是动态k的选择中句长好像是固定句长最大值,这个是不是每句都不一样?这样在程序中应该如何处理呢,谢谢
具体网络架构你可以看summary输出的结果,如下例子:
Model: "model_1"
input_1 (InputLayer) [(None, 128)] 0
embedding (Embedding) (None, 128, 768) 585216 input_1[0][0]
wide_convolution_0 (wide_convolution) (None, 137, 300) 0 embedding[0][0]
wide_convolution_1 (wide_convolution) (None, 133, 300) 0 embedding[0][0]
dynamic_k_max_pooling (dynamic_k_max_pooli (None, 85, 300) 0 wide_convolution_0[0][0]
dynamic_k_max_pooling_3 (dynamic_k_max_poo (None, 85, 300) 0 wide_convolution_1[0][0]
wide_convolution_0_0 (wide_convolution) (None, 91, 300) 0 dynamic_k_max_pooling[0][0]
wide_convolution_1_1 (wide_convolution) (None, 88, 300) 0 dynamic_k_max_pooling_3[0][0]
dynamic_k_max_pooling_1 (dynamic_k_max_poo (None, 42, 300) 0 wide_convolution_0_0[0][0]
dynamic_k_max_pooling_4 (dynamic_k_max_poo (None, 42, 300) 0 wide_convolution_1_1[0][0]
wide_convolution_0_0_0 (wide_convolution) (None, 46, 300) 0 dynamic_k_max_pooling_1[0][0]
wide_convolution_1_1_1 (wide_convolution) (None, 44, 300) 0 dynamic_k_max_pooling_4[0][0]
prem_fold (prem_fold) (None, 46, 150) 0 wide_convolution_0_0_0[0][0]
prem_fold_1 (prem_fold) (None, 44, 150) 0 wide_convolution_1_1_1[0][0]
dynamic_k_max_pooling_2 (dynamic_k_max_poo (None, 3, 150) 0 prem_fold[0][0]
dynamic_k_max_pooling_5 (dynamic_k_max_poo (None, 3, 150) 0 prem_fold_1[0][0]
concatenate (Concatenate) (None, 6, 150) 0 dynamic_k_max_pooling_2[0][0]
dynamic_k_max_pooling_5[0][0]
dropout (Dropout) (None, 6, 150) 0 concatenate[0][0]
flatten (Flatten) (None, 900) 0 dropout[0][0]
Total params: 600,533 Trainable params: 600,533 Non-trainable params: 0
您好,我看了下那篇文章,认为folding还是对维度进行的操作,在原始论文的图三中看到folding输入前本来是4维,变成了2维,所以这里在模型中如何体现还不太清楚,然后论文中,决定dynamickmaxpooling 的k 包括一个公式max(s(L-l)/L,ktop),其中s便是输入句长,在您的模型中看得出应该是 select_k函数决定k值,然而s已经固定为len_max,定义在hyper_parameter中,这里是否和原文想表述的不一呢,我的感觉是输入的每个样本句长在进行kmaxpooling时不能先pad成同一长度,这个如何实现还不太清楚。(可以的话希望能加QQ细聊,我的QQ是我的昵称,非常感谢)
上图中(batch, seq_len, embed_size)卷积后尺寸变成了(batch, seq_len[wide], filters_num),这里的filters_num已经不是你认为的词向量维度了。 你可以这样,二维卷积,卷积核为(filter_size, 1)而不是(filter_size, embed_size),这样子得到的embed_size就会不变,结果会增加一维的filters_num。 你可以在layers文件中修改class为wide_convolution的Layer, 示例如下:
### DCNN ################################
class wide_convolution(L.Layer):
"""
paper: http://www.aclweb.org/anthology/P14-1062
paper title: "A Convolutional Neural Network for Modelling Sentences"
宽卷积, 如果s表示句子最大长度, m为卷积核尺寸,
则宽卷积输出为 s + m − 1,
普通卷积输出为 s - m + 1.
github keras实现可以参考: https://github.com/AlexYangLi/TextClassification/blob/master/models/keras_dcnn_model.py
"""
def __init__(self, filter_num=300, filter_size=3, **kwargs):
self.filter_size = filter_size
self.filter_num = filter_num
super().__init__(**kwargs)
def build(self, input_shape):
super().build(input_shape)
def call(self, inputs):
# x_input_pad = L.ZeroPadding1D((self.filter_size-1, self.filter_size-1))(inputs)
# conv_1d = L.Conv1D(filters=self.filter_num,
# kernel_size=self.filter_size,
# strides=1,
# padding="VALID",
# kernel_initializer="normal", # )(x_input_pad)
# activation="tanh")(x_input_pad)
x_input_pad = L.ZeroPadding1D((self.filter_size - 1, self.filter_size - 1))(inputs)
k_shape = K.int_shape(x_input_pad)
inputs_reshape = L.Reshape((k_shape[1], k_shape[2], 1))(x_input_pad)
conv_2d = L.Conv2D(filters=1,
kernel_size=(self.filter_size, 1),
padding='valid',
kernel_initializer='normal',
activation='tanh',
)(inputs_reshape)
k_shape = K.int_shape(conv_2d)
conv_1d = L.Reshape(target_shape=(k_shape[1], k_shape[2]))(conv_2d)
return conv_1d
def compute_output_shape(self, input_shape):
return input_shape[0], input_shape[1] + self.filter_size - 1, input_shape[-1]
def get_config(self):
config = {"filter_size":self.filter_size,
"filter_num": self.filter_num}
base_config = super(wide_convolution, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
模型结构为:
Model: "model_1"
____________________________________________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================================================
input_1 (InputLayer) [(None, 128)] 0
____________________________________________________________________________________________________________________________________
embedding (Embedding) (None, 128, 768) 553728 input_1[0][0]
____________________________________________________________________________________________________________________________________
wide_convolution_0 (wide_convolution) (None, 137, 768) 0 embedding[0][0]
____________________________________________________________________________________________________________________________________
wide_convolution_1 (wide_convolution) (None, 133, 768) 0 embedding[0][0]
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling (dynamic_k_max_pooli (None, 85, 768) 0 wide_convolution_0[0][0]
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_3 (dynamic_k_max_poo (None, 85, 768) 0 wide_convolution_1[0][0]
____________________________________________________________________________________________________________________________________
wide_convolution_0_0 (wide_convolution) (None, 91, 768) 0 dynamic_k_max_pooling[0][0]
____________________________________________________________________________________________________________________________________
wide_convolution_1_1 (wide_convolution) (None, 88, 768) 0 dynamic_k_max_pooling_3[0][0]
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_1 (dynamic_k_max_poo (None, 42, 768) 0 wide_convolution_0_0[0][0]
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_4 (dynamic_k_max_poo (None, 42, 768) 0 wide_convolution_1_1[0][0]
____________________________________________________________________________________________________________________________________
wide_convolution_0_0_0 (wide_convolution) (None, 46, 768) 0 dynamic_k_max_pooling_1[0][0]
____________________________________________________________________________________________________________________________________
wide_convolution_1_1_1 (wide_convolution) (None, 44, 768) 0 dynamic_k_max_pooling_4[0][0]
____________________________________________________________________________________________________________________________________
prem_fold (prem_fold) (None, 46, 384) 0 wide_convolution_0_0_0[0][0]
____________________________________________________________________________________________________________________________________
prem_fold_1 (prem_fold) (None, 44, 384) 0 wide_convolution_1_1_1[0][0]
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_2 (dynamic_k_max_poo (None, 3, 384) 0 prem_fold[0][0]
____________________________________________________________________________________________________________________________________
dynamic_k_max_pooling_5 (dynamic_k_max_poo (None, 3, 384) 0 prem_fold_1[0][0]
____________________________________________________________________________________________________________________________________
concatenate (Concatenate) (None, 6, 384) 0 dynamic_k_max_pooling_2[0][0]
dynamic_k_max_pooling_5[0][0]
____________________________________________________________________________________________________________________________________
dropout (Dropout) (None, 6, 384) 0 concatenate[0][0]
____________________________________________________________________________________________________________________________________
flatten (Flatten) (None, 2304) 0 dropout[0][0]
____________________________________________________________________________________________________________________________________
dense (Dense) (None, 17) 39185 flatten[0][0]
====================================================================================================================================
Total params: 592,913
Trainable params: 592,913
Non-trainable params: 0
____________________________________________________________________________________________________________________________________
不是很明白你想表达的意思,宽卷积显然针对的是每一个字或词,所以卷积的是embed_size(如300);fold那里针对每个word-embedding的相邻维度做了个平均,但没什么效果,词向量的单独一个维度应该没什么意义吧(我觉得是这样子)。