own-pt / glosstag

Semantically Tagged PWN glosses
Other
7 stars 4 forks source link

inconsistency in pre-POS and lemmas #23

Closed arademaker closed 1 year ago

arademaker commented 3 years ago
  1. tokens without lemmas as below. It looks like all of them are punctuation. We can add lemma for them, same as form.
{'form': ';', 'kind': ['wf'], 'meta': {'pos': ':', 'type': 'punc'}, 'tag': 'ignore', 'begin': 46, 'end': 47}
  1. tokens with lemmas without %N

    1. some with meta {'form': 'to', 'kind': ['wf'], 'lemmas': ['to'], 'meta': {'pos': 'TO'}, 'tag': 'ignore', 'begin': 50, 'end': 52}
    2. some without meta {'form': 'of', 'kind': ['wf'], 'lemmas': ['of'], 'tag': 'ignore', 'begin': 15, 'end': 17}
    3. some proper nouns {'form': 'Edmund', 'kind': ['wf'], 'lemmas': ['Edmund'], 'tag': 'un', 'begin': 41, 'end': 47}
    4. some missing words in PWN {'glob': 'man', 'kind': ['glob', 'c'], 'lemmas': ['flour_moths'], 'tag': 'un'}
    5. some annotated {'glob': 'man', 'kind': ['glob', 'b'], 'lemmas': ['appellate_court'], 'senses': ['appellate_court%1:14:00::'], 'tag': 'man'}
arademaker commented 1 year ago

Over the years, we learn that tokens, where lemmas do not contain %N, are the ones that may not exists in WordNet 3.0. This is not an inconsistency. Nothing to do here, so I will close.