Open meshiguge opened 6 years ago
Q1 about loss function (9) of label embeddings, what is the meaning of y_k, is it the label of embeddings or sequence class? A1: y_k is the one-hot vector to represent the target (sequence class). Also in (9), c_k is the corresponding label embedding that we are going to learn.
Q2: If there are K=3 classes in the dataset, does it mean that there are 3 training sample for label ? A2: If there are K=3 classes in the dataset, it means there are 3 label embeddings to learn (i.e., “3 training samples for labels”), one for each label.
Q3: How to process the label loss(9) in each batch of training case ?
A3: Note that label embeddings are associated with their corresponding label embeddings, i.e. each c_k is associated with its own y_k.
Eq (9) is a regularizer of the main loss (7). Therefore, for each text-label pair (x_n, y_nk), where n is the index of data sample, k is the ground-truth label (i.e. the k-th label), c_k is trained to satisfy two objectives: (A) stay in the center of its own class manifold, as imposed in (9); and (B) Attend the text sequence representation z_n in (7).
Sorry for the abuse of notations. A more correct but ugly formulation of (9) goes like this: 1/K sim{k=1}^{K} sum{n \in {n | y_n = k } } CE( y_nk, f_2(c_nk) )
你好,embedding的损失函数(9),只有K个训练样本的意思吗? 如何嵌套进training batch呢?