yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
827 stars 139 forks source link

pad_index #46

Closed attardi closed 3 years ago

attardi commented 3 years ago

The code uses two values for padding: 0, the default, for nn.Embedding such as char and tag feature types, and feat_pad_index for CharLSTM and Bert Embedding. A value for pad_index is stored as self.pad_index in the model BiaffineDependencyModel as well as saved among the parameters in a model file. But BERT models uses a different value, usually different from feat_pad_index and pad_index. In method BiaffineDependencyModel.forward(), self.pad_index is used to compute a mask:

mask = words.ne(self.pad_index)

but that might not be the right value. I run into this problem when using the French model camembert-base which uses a pad index of 1. Besides, in my experiments, I also get away with feature word_embed.

I propose is to use its own value on feats rather than on words:

    word_feats = feats[:,:,0] # drop subpiece dimension                              
    batch_size, seq_len = word_feats.shape
    # get the mask and lengths of given batch                                         
    mask = word_feats.ne(self.feat_embed.pad_index)
    lens = mask.sum(dim=1)
yzhangcs commented 3 years ago

Since the vocab for BERT is not built from scratch, the pad_index comes from the BERT vocab rather than a default value. https://github.com/yzhangcs/parser/blob/f002b2ca32b021afd75c9dc4f360b82b5b96bf2b/supar/utils/field.py#L112

attardi commented 3 years ago

Sounds good. I forgot to say that in modules/bert.py I do this:

self.pad_index = self.bert.config.pad_token_id                 

It should be the same value, if pad is passed correctly to init(): which I wouldn't trust. In fact, I would eliminate the optional parameter.