In `transform_roc`, why do we need `xmb[:, :, :, 1] `?

huggingface / pytorch-openai-transformer-lm

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

MIT License

1.51k stars 285 forks source link

In `transform_roc`, why do we need `xmb[:, :, :, 1] `? #12

Open FrankWork opened 6 years ago

FrankWork commented 6 years ago

def transform_roc(X1, X2, X3):
    n_batch = len(X1)
    xmb = np.zeros((n_batch, 2, n_ctx, 2), dtype=np.int32)
    mmb = np.zeros((n_batch, 2, n_ctx), dtype=np.float32)
    start = encoder['_start_']
    delimiter = encoder['_delimiter_']
    for i, (x1, x2, x3), in enumerate(zip(X1, X2, X3)):
        x12 = [start] + x1[:max_len] + [delimiter] + x2[:max_len] + [clf_token]
        x13 = [start] + x1[:max_len] + [delimiter] + x3[:max_len] + [clf_token]
        l12 = len(x12)
        l13 = len(x13)
        xmb[i, 0, :l12, 0] = x12
        xmb[i, 1, :l13, 0] = x13
        mmb[i, 0, :l12] = 1
        mmb[i, 1, :l13] = 1
    xmb[:, :, :, 1] = np.arange(n_vocab + n_special, n_vocab + n_special + n_ctx)
    return xmb, mmb

rodgzilla commented 6 years ago

This part of the xmb is used for the learned positional encoding.

An embedding vector is associated by the network to each position of the input and this vector is then added to the corresponding word embedding during the forward pass of the network.

def forward(self, x):
    x = x.view(-1, x.size(-2), x.size(-1))
    e = self.embed(x)
    h = e.sum(dim=2)
    for block in self.h:
        h = block(h)
    return h

The line h = e.sum(dim = 2) do this addition.

FrankWork commented 6 years ago

@rodgzilla Thank you! You saved my day!

guotong1988 commented 6 years ago

Mark