yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
832 stars 141 forks source link

How to add new features for LSTM #96

Closed MinionAttack closed 2 years ago

MinionAttack commented 2 years ago

Hi,

I'm trying to add new features to use with LSTM, after reading the documentation and looking at the code, I've tried to replicate it looking at how it's done with tag, char or lemma. I want to use during training the 7th and 8th column in the CoNLL-U format to see if I get better models.

For now, I've added the features I want to use (supar/cmds/biaffine_sdp.py):

subparser.add_argument('--feat', '-f', choices=['tag', 'char', 'lemma', 'head', 'deprel', 'elmo', 'bert'], nargs='+', help='features to use')

Then, following the way it's done with tag, char or lemma I've added the code to use the new features (supar/parsers/sdp.py):

TAG, CHAR, LEMMA, ELMO, BERT, HEAD, DEPREL = None, None, None, None, None, None, None
.....
if 'tag' in args.feat:
    TAG = Field('tags', bos=BOS)
if 'char' in args.feat:
    CHAR = SubwordField('chars', pad=PAD, unk=UNK, bos=BOS, fix_len=args.fix_len)
if 'lemma' in args.feat:
    LEMMA = Field('lemmas', pad=PAD, unk=UNK, bos=BOS, lower=True)
if 'head' in args.feat:
    HEAD = Field('heads', bos=BOS)
if 'deprel' in args.feat:
    DEPREL = Field('relations', bos=BOS)
if 'elmo' in args.feat:
    from allennlp.modules.elmo import batch_to_ids
    ELMO = RawField('elmo')
    ELMO.compose = lambda x: batch_to_ids(x).to(WORD.device)
.....
transform = CoNLL(FORM=(WORD, CHAR, ELMO, BERT), LEMMA=LEMMA, CPOS=TAG, HEAD=HEAD, DEPREL=DEPREL, PHEAD=LABEL)

train = Dataset(transform, args.train)
if args.encoder != 'bert':
    WORD.build(train, args.min_freq, (Embedding.load(args.embed, args.unk) if args.embed else None))
    if TAG is not None:
        TAG.build(train)
    if CHAR is not None:
        CHAR.build(train)
    if LEMMA is not None:
        LEMMA.build(train)
    if HEAD is not None:
        HEAD.build(train)
    if DEPREL is not None:
        DEPREL.build(train)
LABEL.build(train)
args.update({
    'n_words': len(WORD.vocab) if args.encoder == 'bert' else WORD.vocab.n_init,
    'n_labels': len(LABEL.vocab),
    'n_tags': len(TAG.vocab) if TAG is not None else None,
    'n_chars': len(CHAR.vocab) if CHAR is not None else None,
    'char_pad_index': CHAR.pad_index if CHAR is not None else None,
    'n_lemmas': len(LEMMA.vocab) if LEMMA is not None else None,
    'n_heads': len(HEAD.vocab) if HEAD is not None else None,
    'n_relations': len(DEPREL.vocab) if DEPREL is not None else None,
    'bert_pad_index': BERT.pad_index if BERT is not None else None,
    'pad_index': WORD.pad_index,
    'unk_index': WORD.unk_index,
    'bos_index': WORD.bos_index
})

I also add them in supar/models/model.py:

.....
def __init__(self,
     n_words,
     n_tags=None,
     n_chars=None,
     n_lemmas=None,
     n_heads=None,
     n_relations=None,
     encoder='lstm',
     feat=['char'],
.....
if 'tag' in feat:
    self.tag_embed = nn.Embedding(num_embeddings=n_tags, embedding_dim=n_feat_embed)
    n_input += n_feat_embed
if 'char' in feat:
    self.char_embed = CharLSTM(n_chars=n_chars, n_embed=n_char_embed, n_hidden=n_char_hidden, n_out=n_feat_embed, pad_index=char_pad_index, dropout=char_dropout)
    n_input += n_feat_embed
if 'lemma' in feat:
    self.lemma_embed = nn.Embedding(num_embeddings=n_lemmas, embedding_dim=n_feat_embed)
    n_input += n_feat_embed
if 'head' in feat:
    self.head_embed = nn.Embedding(num_embeddings=n_heads, embedding_dim=n_feat_embed)
    n_input += n_feat_embed
if 'deprel' in feat:
    self.deprel_embed = nn.Embedding(num_embeddings=n_relations, embedding_dim=n_feat_embed)
    n_input += n_feat_embed
.....
feat_embeds = []
if 'tag' in self.args.feat:
    feat_embeds.append(self.tag_embed(feats.pop()))
if 'char' in self.args.feat:
    feat_embeds.append(self.char_embed(feats.pop(0)))
if 'elmo' in self.args.feat:
    feat_embeds.append(self.elmo_embed(feats.pop(0)))
if 'bert' in self.args.feat:
    feat_embeds.append(self.bert_embed(feats.pop(0)))
if 'lemma' in self.args.feat:
    feat_embeds.append(self.lemma_embed(feats.pop(0)))
if 'head' in self.args.feat:
    feat_embeds.append(self.head_embed(feats.pop(0)))
if 'deprel' in self.args.feat:
    feat_embeds.append(self.deprel_embed(feats.pop(0)))
word_embed, feat_embed = self.embed_dropout(word_embed, torch.cat(feat_embeds, -1)

When I try to train a model to see if these changes are an improvement, I get this output:

2022-02-22 10:11:01 INFO Building the fields
2022-02-22 10:11:25 INFO CoNLL(
 (words): Field(pad=<pad>, unk=<unk>, bos=<bos>, lower=True)
 (chars): SubwordField(pad=<pad>, unk=<unk>, bos=<bos>)
 (tags): Field(bos=<bos>)
 (heads): Field(bos=<bos>)
 (relations): Field(bos=<bos>)
 (labels): ChartField()
)
2022-02-22 10:11:25 INFO Building the model
2022-02-22 10:11:52 INFO BiaffineSemanticDependencyModel(
  (word_embed): Embedding(225, 300)
  (tag_embed): Embedding(17, 100)
  (char_embed): CharLSTM(87, 50, n_out=100, pad_index=0)
  (head_embed): Embedding(64, 100)
  (deprel_embed): Embedding(31, 100)
  (embed_dropout): IndependentDropout(p=0.2)
  (encoder): VariationalLSTM(825, 600, num_layers=3, bidirectional=True, dropout=0.33)
  (encoder_dropout): SharedDropout(p=0.33, batch_first=True)
  (edge_mlp_d): MLP(n_in=1200, n_out=600, dropout=0.25)
  (edge_mlp_h): MLP(n_in=1200, n_out=600, dropout=0.25)
  (label_mlp_d): MLP(n_in=1200, n_out=600, dropout=0.33)
  (label_mlp_h): MLP(n_in=1200, n_out=600, dropout=0.33)
  (edge_attn): Biaffine(n_in=600, n_out=2, bias_x=True, bias_y=True)
  (label_attn): Biaffine(n_in=600, n_out=4, bias_x=True, bias_y=True)
  (criterion): CrossEntropyLoss()
  (pretrained): Embedding(178861, 300)
  (embed_proj): Linear(in_features=300, out_features=125, bias=True)
)

2022-02-22 10:11:55 INFO Loading the data
  0%|                  | 0/32 00:00<?, ?it/s2022-02-22 10:11:58 INFO 
train: Dataset(n_sentences=1063, n_batches=32, n_buckets=32)
dev:   Dataset(n_sentences=152, n_batches=32, n_buckets=32)
test:  Dataset(n_sentences=152, n_batches=32, n_buckets=32)

But I get an error when the training is about to start:

2022-02-22 10:11:58 INFO Epoch 1 / 5000:
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [10,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [28,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/iago/.local/share/JetBrains/IntelliJIdea2021.3/python/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/iago/.local/share/JetBrains/IntelliJIdea2021.3/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/cmds/biaffine_sdp.py", line 47, in <module>
    main()
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/cmds/biaffine_sdp.py", line 43, in main
    parse(parser)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/cmds/cmd.py", line 29, in parse
    parser.train(**args)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/parsers/sdp.py", line 52, in train
    return super().train(**Config().update(locals()))
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/parsers/parser.py", line 74, in train
    self._train(train.loader)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/parsers/sdp.py", line 141, in _train
    s_edge, s_label = self.model(words, feats)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/models/sdp.py", line 180, in forward
    x = self.encode(words, feats)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/models/model.py", line 166, in encode
    x = pack_padded_sequence(self.embed(words, feats), words.ne(self.args.pad_index).sum(1).tolist(), True, False)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/supar/models/model.py", line 145, in embed
    feat_embeds.append(self.tag_embed(feats.pop()))
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/venv/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/home/iago/Escritorio/SuPar_Pre-finetuning/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 2044, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered

The error is located at:

if 'tag' in self.args.feat:
    feat_embeds.append(self.tag_embed(feats.pop()))

I have tried to find out what is happening, but I am not able to find the cause of the problem. What am I doing wrong or what have I forgotten to modify to make it work?

Regards.

yzhangcs commented 2 years ago

@MinionAttack Please kindly note that features in feats strictly follow the order you initialize the fields,

CoNLL(FORM=(WORD, CHAR, ELMO, BERT), LEMMA=LEMMA, CPOS=TAG, HEAD=HEAD, DEPREL=DEPREL, PHEAD=LABEL)

so the order of obtaining feature vectors should correspond strictly:

# these look a bit misplaced
if 'head' in self.args.feat:
    feat_embeds.append(self.head_embed(feats.pop(0)))
if 'deprel' in self.args.feat:
    feat_embeds.append(self.deprel_embed(feats.pop(0)))

Sorry that these aspects are missing in the documentation.

MinionAttack commented 2 years ago

@yzhangcs thanks for the answer but what do you mean with "# these look a bit misplaced".

Also I'm passing the features in this order:

--feat tag char head deprel

yzhangcs commented 2 years ago

@MinionAttack

but what do you mean with "# these look a bit misplaced"

the order of tag char head deprel being fetched out should be char -> tag -> head -> deprel

MinionAttack commented 2 years ago

the order of tag char head deprel being fetched out should be char -> tag -> head -> deprel

Oh, I think I've understood it. You mean that the order to follow is this one:

CoNLL(FORM=(WORD, CHAR, ELMO, BERT), LEMMA=LEMMA, CPOS=TAG, HEAD=HEAD, DEPREL=DEPREL, PHEAD=LABEL)

So I've changed the order and set this:

if 'tag' in self.args.feat:
    feat_embeds.append(self.tag_embed(feats.pop()))
if 'char' in self.args.feat:
    feat_embeds.append(self.char_embed(feats.pop(0)))
if 'elmo' in self.args.feat:
    feat_embeds.append(self.elmo_embed(feats.pop(0)))
if 'bert' in self.args.feat:
    feat_embeds.append(self.bert_embed(feats.pop(0)))
if 'lemma' in self.args.feat:
    feat_embeds.append(self.lemma_embed(feats.pop(0)))

to this:

if 'char' in self.args.feat:
    feat_embeds.append(self.char_embed(feats.pop(0)))
if 'elmo' in self.args.feat:
    feat_embeds.append(self.elmo_embed(feats.pop(0)))
if 'bert' in self.args.feat:
    feat_embeds.append(self.bert_embed(feats.pop(0)))
if 'lemma' in self.args.feat:
    feat_embeds.append(self.lemma_embed(feats.pop(0)))
if 'tag' in self.args.feat:
    feat_embeds.append(self.tag_embed(feats.pop(0)))
if 'head' in self.args.feat:
    feat_embeds.append(self.head_embed(feats.pop(0)))
if 'deprel' in self.args.feat:
    feat_embeds.append(self.deprel_embed(feats.pop(0)))

So now they will always be popped in the right order. Am I right?

yzhangcs commented 2 years ago

@MinionAttack Yeah, exactly.

MinionAttack commented 2 years ago

Hello, I am reopening this topic so as not to duplicate it. After being able to train a LSTM by adding these features, I am not able to make it predict files. When it tries to get head and deprel:

        if 'tag' in self.args.feat:
            feat_embeds.append(self.tag_embed(feats.pop(0)))
        if 'head' in self.args.feat:
            feat_embeds.append(self.head_embed(feats.pop(0)))
        if 'deprel' in self.args.feat:
            feat_embeds.append(self.deprel_embed(feats.pop(0)))
        word_embed, feat_embed = self.embed_dropout(word_embed, torch.cat(feat_embeds, -1))

I'm getting an error here:

    ....
    feat_embeds.append(self.head_embed(feats.pop(0)))
IndexError: pop from empty list

I have tried to debug the code but I can't find the place where the feats variable is initialised or loaded with the data. I have tried to follow the code flow, so:

At line 137 in supar/parsers/parser.py the _predict method receives a loader which is used to extract the batches to iterate over at line 183 in supar/parsers/sdp.py. Then in the next line the feats variable is unpacked from batch:

words, *feats = batch

Where the batch is:

batch = Batch(words, chars, tags)

and its content is:

fields = ['words', 'chars', 'tags']
sentences = [ ... ]
transformed = { 'words': Tensor, 'chars': Tensor, 'tags': Tensor}

That means that feats is a list of 2 Tensors, which I think corresponds to char and tag, instead of 4 Tensors for char, tag, head and deprel.

Finally, on line 134 of supar/models/model.py:

feat_embeds = []
if 'char' in self.args.feat:
    feat_embeds.append(self.char_embed(feats.pop(0)))
if 'elmo' in self.args.feat:
    feat_embeds.append(self.elmo_embed(feats.pop(0)))
if 'bert' in self.args.feat:
    feat_embeds.append(self.bert_embed(feats.pop(0)))
if 'lemma' in self.args.feat:
    feat_embeds.append(self.lemma_embed(feats.pop(0)))
if 'tag' in self.args.feat:
    feat_embeds.append(self.tag_embed(feats.pop(0)))
if 'head' in self.args.feat:
    feat_embeds.append(self.head_embed(feats.pop(0)))
if 'deprel' in self.args.feat:
    feat_embeds.append(self.deprel_embed(feats.pop(0)))
word_embed, feat_embed = self.embed_dropout(word_embed, torch.cat(feat_embeds, -1))
# concatenate the word and feat representations
embed = torch.cat((word_embed, feat_embed), -1)

return embed

I can see that the previously saved model (self) contains the features:

...
char_embed = CharLSTM(87, 50, n_out=100, pad_index=0)
...
deprel_embed = Embedding(31, 100)
...
head_embed = Embedding(64, 100)
...
tag_embed = Embedding(17, 100)
word_embed = Embedding(225, 300)

Can you give me any hints or guidance on where feats are loaded and initialised? To see why it has a length of 2 instead of 4.

Regards.

yzhangcs commented 2 years ago

@MinionAttack Oh that's weird. Could you print the saved transform, which I suppose should contain 9 fields as you saved.

MinionAttack commented 2 years ago

Doing debug, if I copy the value of the variable trasform inside dataset I get:

CoNLL(
 (words): Field(pad=<pad>, unk=<unk>, bos=<bos>, lower=True)
 (chars): SubwordField(pad=<pad>, unk=<unk>, bos=<bos>)
 (tags): Field(bos=<bos>)
)

I attach an image for you to see it better.

imagen

I noticed that flattened_fields shows words, chars and tags, but not the ones I added (head and deprel). Could this be related?

yzhangcs commented 2 years ago

@MinionAttack Does the two fields exist while training?

MinionAttack commented 2 years ago

Yes, I have trained a new model and I have set a breakpoint at line 70 in supar/parsers/parser.py right before the model starts to train. Inspecting the self variable which is an instance of BiaffineSemanticDependencyParser I can see that the value of the transform attribute is:

CoNLL(
 (words): Field(pad=<pad>, unk=<unk>, bos=<bos>, lower=True)
 (chars): SubwordField(pad=<pad>, unk=<unk>, bos=<bos>)
 (tags): Field(bos=<bos>)
 (heads): Field(bos=<bos>)
 (relations): Field(bos=<bos>)
 (labels): ChartField()
)

(relations is the name given to DEPREL in the build method of supar/parsers/sdp.py)

And the value of flattened_fields is:

flattened_fields = [(words): Field(pad=<pad>, unk=<unk>, bos=<bos>, lower=True), (chars): SubwordField(pad=<pad>, unk=<unk>, bos=<bos>), (tags): Field(bos=<bos>), (heads): Field(bos=<bos>), (relations): Field(bos=<bos>), (labels): ChartField()]

I attach again an image for you to see it better:

imagen

So I think the fields exist in the training phase, am I right?

EDIT 1:

I have also debugged to see what is saved after the training phase and in the save method of supar/parsers/parser.py, in:

state = {'name': self.NAME,
         'args': args,
         'state_dict': state_dict,
         'pretrained': pretrained,
         'transform': self.transform}
torch.save(state, path, pickle_module=dill)

The value of self.transform is:

CoNLL(
 (words): Field(pad=<pad>, unk=<unk>, bos=<bos>, lower=True)
 (chars): SubwordField(pad=<pad>, unk=<unk>, bos=<bos>)
 (tags): Field(bos=<bos>)
 (heads): Field(bos=<bos>)
 (relations): Field(bos=<bos>)
 (labels): ChartField()
)

And the flattened_fields attribute has both heads (HEAD) and relations (DEPREL) fields.

flattened_fields = [(words): Field(pad=<pad>, unk=<unk>, bos=<bos>, lower=True), (chars): SubwordField(pad=<pad>, unk=<unk>, bos=<bos>), (tags): Field(bos=<bos>), (heads): Field(bos=<bos>), (relations): Field(bos=<bos>), (labels): ChartField()]

EDIT 2:

After seeing in EDIT 1 that the values are stored in the model I have debugged the code when loading the model to predict. In the load method of supar/parsers/parser.py:

model.to(args.device)
transform = state['transform']
parser = cls(args, model, transform)
parser.checkpoint_state_dict = state['checkpoint_state_dict'] if args.checkpoint else None
return parser

The value of transform contains the head and relations fields and both are present in _flattenedfields too. But what I have found is that when parser.predict(**args) is called in supar/cmds/cmd.py:

If I set a breakpoint in the line self.transform.eval(), head and relations fields are present but after this line is executed, the values are gone.

Why changing the status to self.train(False) removes the values? Reading the documentation it says:

Attributes: training (bool): Sets the object in training mode. If False, some data fields not required for predictions won't be returned. Default: True.

So this makes that the flattened_fields function in supar/utils/transform.py ignores them?

@property
def flattened_fields(self):
    flattened = []
    for field in self:
        if field not in self.src and field not in self.tgt:
            continue
        if not self.training and field in self.tgt:
            continue
        if not isinstance(field, Iterable):
            field = [field]
        for f in field:
            if f is not None:
                flattened.append(f)
    return flattened

Because they are stored in self.tgt. I see that in the train and evaluate command is set to self.transform.train().

So I guess this is why they are missing. Can this behaviour be changed or would it affect the operation of the parser?

yzhangcs commented 2 years ago

@MinionAttack I see, only fields in transform.src will be loaded during prediction. https://github.com/yzhangcs/parser/blob/09f37241ba5ad2b4bf971ae1e59185d7c172aa58/supar/utils/transform.py#L134-L140 So you may need to change the above lines to:

 @property 
 def src(self): 
     return self.FORM, self.LEMMA, self.CPOS, self.POS, self.FEATS, self.DEPREL, self.PHEAD 

 @property 
 def tgt(self): 
     return self.HEAD, self.PDEPREL
MinionAttack commented 2 years ago

Ah ok, I didn't realise that. Thanks again for the help.