neuralmind-ai / portuguese-bert

Portuguese pre-trained BERT models
Other
783 stars 120 forks source link

bert-crf #29

Open Astudnew opened 3 years ago

Astudnew commented 3 years ago

Hello in this part of your code in bertcrf class (forward fn), you write it is for pass the first token but i don't understand how this hanpped (and the len of seq_logist and seq_lables doed not change it the same length with sub tokens , CLS and SEP )

"for seq_logits, seq_labels, seq_mask in zip(logits, labels, mask):

Index logits and labels using prediction mask to pass only the

            # first subtoken of each word to CRF.
            seq_logits = seq_logits[seq_mask].unsqueeze(0)
            seq_labels = seq_labels[seq_mask].unsqueeze(0)
            loss -= self.crf(seq_logits, seq_labels,
                             reduction='token_mean')"
fabiocapsouza commented 3 years ago

Hi @Phd-Student2018 , I don't know if I understood your question, but here is an example of this indexing:

suppose we have the following words, tokens and labels


words = ["My", "name", "is", "Fabio"]
tokens = ["[CLS]", "My", "name", "is", "Fa", "##bio", "[SEP]"]
label_tags = ["X", "O", "O", "O", "B-PERSON", "X", "X"]  # X is ignore
labels = [-100, 0, 0, 0, 1, -100, -100]   # label tags converted to int ids
seq_mask = [False, True, True, True, True, False, False]   # False for special tokens and word continuations ("##")

# The CRF layer must receive only the logits and labels of the tokens ["My", "name", "is", "Fa"]
# B = batch size
# S = sequence length
# C = number of classes/tags
# logits.shape == (B, S, C)
# labels.shape == (B, S)
# After zip:
# seq_logits.shape == (S, C)
# seq_labels.shape == (S,)

# The indexing of seq_logits and seq_labels by seq_mask will produce:
# seq_logits.shape == (P, C)
# seq_labels.shape == (P,)
# The unsqueeze adds back the batch dimension: (1, P, C) and (1, P)

P is the number of words given by basic whitespace and punctuation tokenization, P = seq_mask.sum()

Hope it helps

Astudnew commented 3 years ago

Yes , it is very helpful Thank you very much

Astudnew commented 3 years ago

please , another question for testing ,to compare prediction list with original label list(y-true ), how we can get y-true