text-machine-lab / entity-coref

Entity co-reference task, from CoNLL-2012
10 stars 3 forks source link

Class CorefTaggerReview means what? #4

Open smallsmallwood opened 5 years ago

smallsmallwood commented 5 years ago

I find the output is 12 vector in Class CorefTagger,and the final output y is a 13 vector in your paper. Are there any differences?

another question: did you test on ".auto_conll" in you paper (CONLLING 2018)

ylmeng commented 5 years ago

Sorry for the confusion. For a triad (a, b, c), we only use output for (a, c) and (b, c) in the current version. So a triad can have three pairwise outputs, but we use two of them for final predictions. It is more efficient and often times more accurate. However, inside the neural network, there is no change from the original version. All three pairs still go through the layers. If you use three pairs, the scores should be very similar, maybe a little bit lower.

Test set does not have gold_conll so we used auto_conll only. We had some bug in the evaluation program for our COLING paper so the scores are not as good as current one. Specifically, separate parts of an article do not have coreference in between, but we had assumed coreference could occur across the parts, and made the task more difficult. After we fixed the bug the scores get better, as you can see in the Arxiv paper. Please refer to Arxiv paper, which corrected some errors. (We tried to update the COLING paper too but the process is longer).

smallsmallwood commented 5 years ago

Thanks for your patience. I also don't understand the role of the operation torch.max(), as follows:(I did‘n find some analyses in your new paper)

word_repr0, = self.Attention(word_lstm_0, torch.cat([word_lstm_1, word_lstm_2], 1)) word_repr0, = torch.max(word_repr_0, dim=1, keepdim=False) # (batch, feature)

word_repr1, = self.Attention(word_lstm_1, torch.cat([word_lstm_0, word_lstm_2], 1)) word_repr1, = torch.max(word_repr_1, dim=1, keepdim=False) # (batch, feature)

word_repr2, = self.Attention(word_lstm_2, torch.cat([word_lstm_0, word_lstm_1], 1)) word_repr2, = torch.max(word_repr_2, dim=1, keepdim=False) # (batch, feature)

gold-60 It is lower than your results tested with gold mentions in your new paper.

ylmeng commented 5 years ago

Sorry for the delay. torch.max() just does the max-pooling, which is widely used for RNN-based models. So instead of using the output of the last time step, or the average over time steps, we use the max value over time steps to represent the sequence.