Question: Have you tried to ELMO based your code?

xiaoxiaoAurora commented 5 years ago

Hi, Have you tried to ELMO based your code? I tried to do this, but I have some problems. Traceback (most recent call last): File "run.py", line 42, in <module> args.func(args) File "/home/workspace/elmo_biaffineparser/parser1/commands/train.py", line 111, in __call__ File "/home/workspace/elmo_biaffineparser/parser1/model.py", line 36, in __call__ self.train(train_loader, trainwords) File "/home/workspace/elmo_biaffineparser/parser1/model.py", line 84, in train loss.backward() File "/home/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

yzhangcs commented 5 years ago

Yes, and it works very well, but I didn't add it to this repo. Can you give me more details, what is trainwords?

yzhangcs commented 5 years ago

There is a usage example in this uncompleted repo.

xiaoxiaoAurora commented 5 years ago

Yes, and it works very well, but I didn't add it to this repo. Can you give me more details, what is trainwords?

This is one of my trainwords: 1 中国 _ _ _ _ 5 _ _ _ 2 最大 _ _ _ _ 5 _ _ _ 3 氨纶丝 _ _ _ _ 5 _ _ _ 4 生产 _ _ _ _ 5 _ _ _ 5 基地 _ _ _ _ 8 _ _ _ 6 在 _ _ _ _ 7 _ _ _ 7 连云港 _ _ _ _ 8 _ _ _ 8 建成 _ _ _ _ 0 root _ _

And the following is my parser.py : ` class BiaffineParser(nn.Module):

def __init__(self, params, embeddings):
    super(BiaffineParser, self).__init__()

    self.params = params
    # the embedding layer
    self.pretrained = nn.Embedding.from_pretrained(embeddings)
    self.embed = nn.Embedding(num_embeddings=params['n_words'],
                              embedding_dim=params['n_embed'])
    # Add ELMO
    self.elmo = Elmo(params['options_file'], params['weight_file'], 1, dropout=0)

    self.tag_embed = nn.Embedding(num_embeddings=params['n_tags'],
                                  embedding_dim=params['n_tag_embed'])
    self.embed_dropout = IndependentDropout(p=params['embed_dropout'])

    # the word-lstm layer
    self.lstm = LSTM(input_size=params['n_embed'] + params['n_elmo_embed'],
                     hidden_size=params['n_lstm_hidden'],
                     num_layers=params['n_lstm_layers'],
                     dropout=params['lstm_dropout'],
                     bidirectional=True)
    self.lstm_dropout = SharedDropout(p=params['lstm_dropout'])

    # the MLP layers
    self.mlp_arc_h = MLP(n_in=params['n_lstm_hidden'] * 2,
                         n_hidden=params['n_mlp_arc'],
                         dropout=params['mlp_dropout'])
    self.mlp_arc_d = MLP(n_in=params['n_lstm_hidden'] * 2,
                         n_hidden=params['n_mlp_arc'],
                         dropout=params['mlp_dropout'])
    self.mlp_rel_h = MLP(n_in=params['n_lstm_hidden'] * 2,
                         n_hidden=params['n_mlp_rel'],
                         dropout=params['mlp_dropout'])
    self.mlp_rel_d = MLP(n_in=params['n_lstm_hidden'] * 2,
                         n_hidden=params['n_mlp_rel'],
                         dropout=params['mlp_dropout'])

    # the Biaffine layers
    self.arc_attn = Biaffine(n_in=params['n_mlp_arc'],
                             bias_x=True,
                             bias_y=False)
    self.rel_attn = Biaffine(n_in=params['n_mlp_rel'],
                             n_out=params['n_rels'],
                             bias_x=True,
                             bias_y=True)
    self.pad_index = params['pad_index']
    self.unk_index = params['unk_index']

    self.reset_parameters()

def reset_parameters(self):
    nn.init.zeros_(self.embed.weight)

def forward(self, words, tags, trainwords):
    # get the mask and lengths of given batch
    # print('forward words[0]: ', type(words), words[0], words[0].size)
    # print('trainwords[0]:', type(list(trainwords.items())[0][0]), list(trainwords.items())[0])
    mask = words.ne(self.pad_index)
    lens = mask.sum(dim=1)

    # get outputs from ELMO layers
    sentences = []
    pad_items = []
    print('len(trainwords): %d, len(words): %d' % (len(trainwords), len(words)))
    items = list(trainwords.items())
    print('list(trainwords.items())[0]:  ', type(items[0]), items[0], items[0][0], items[0][1])
    for i in range(len(words)):
        item = items[i]
        pad_item = nn.ConstantPad2d((item[0], words[i]), 0).padding[1]
        if torch.equal(pad_item, words[i]):
            sentence = trainwords[item[0]]
            sentences.append(sentence)
        else:
            print('Error!!!!!!!!!!!')
    print('len(sentences): %d, len(words): %d' % (len(sentences), len(words)))
    # print('sentences: ', len(sentences), type(sentences), sentences[0])
    charaters_id = batch_to_ids(sentences)
    elmo_output = self.elmo(charaters_id.cuda())
    elmo_embed = elmo_output['elmo_representations'][0]

    # get outputs from embedding layers
    embed = self.pretrained(words)
    embed += self.embed(
        words.masked_fill_(words.ge(self.embed.num_embeddings),
                           self.unk_index)
    )
    tag_embed = self.tag_embed(tags)
    embed, tag_embed = self.embed_dropout(embed, tag_embed)
    # concatenate the word, ELMO and tag representations
    x = torch.cat((embed, elmo_embed, tag_embed), dim=-1)

    sorted_lens, indices = torch.sort(lens, descending=True)
    inverse_indices = indices.argsort()
    x = pack_padded_sequence(x[indices], sorted_lens, True)
    x = self.lstm(x)
    x, _ = pad_packed_sequence(x, True)
    x = self.lstm_dropout(x)[inverse_indices]

    # apply MLPs to the LSTM output states
    arc_h = self.mlp_arc_h(x)
    arc_d = self.mlp_arc_d(x)
    rel_h = self.mlp_rel_h(x)
    rel_d = self.mlp_rel_d(x)

    # get arc and rel scores from the bilinear attention
    # [batch_size, seq_len, seq_len]
    s_arc = self.arc_attn(arc_d, arc_h)
    # [batch_size, seq_len, seq_len, n_rels]
    s_rel = self.rel_attn(rel_d, rel_h).permute(0, 2, 3, 1)
    # set the scores that exceed the length of each sentence to -inf
    s_arc.masked_fill_((1 - mask).unsqueeze(1), float('-inf'))

    return s_arc, s_rel

`

xiaoxiaoAurora commented 5 years ago

Yes, and it works very well, but I didn't add it to this repo. Can you give me more details, what is trainwords?

And I want to torch.cat(elmo_embed, word_embed), where the elmo_embed is a pretrained elmo.

yzhangcs commented 5 years ago

I didn't use ELMo as a PyTorch module, but pretrained it and read ELMo embeddings from a HDF5 file. Loading ELMo through the DataLoader seems to be more consistent with the original code.

xiaoxiaoAurora commented 5 years ago

Loading ELMo through the DataLoader

I have some questions: 1.Loading ELMo through the DataLoader：the DataLoader process converts the word into a word index, but is the ELMO vocabulary the same as the DataLoader vocabulary?

Does the added ELMO emdedding need to reduce the dimension? Consistent with word embedding? (1024-->?300)

yzhangcs commented 5 years ago

From the tutorial, the ELMo embeddings will be saved in an HDF5 file, each token corresponds to a representation with the shape of [3, 1024]
No need to reduce the dimension

xiaoxiaoAurora commented 5 years ago

From the tutorial, the ELMo embeddings will be saved in an HDF5 file, each token corresponds to a representation with the shape of [3, 1024]

No need to reduce the dimension ok, Ok, I think of how to prove it, thank you very much for your patience and answer. Have a nice day!

yzhangcs / parser

Question: Have you tried to ELMO based your code? #8