This paper investigates how good the token representations from PLM are for lexical semantics.
Lexical tasks:
LSIM: word pair similarity spearman correlation;
WA: a+b=c+d;
BLI: bilingual word pairs mapping
CLIR: cross-lingual retrieval
RELP: word pair relation classification, for example, whether "state" is a subset of "country".
They compares the following setups:
includes:
monolingual/multilingual;
single word encoding/average contextual word encoding across different contexts;
encodes special tokens or not;
layer average.
They find that mono is better than multi; context is important; special tokens are useless; layer average is effective but not for all tasks.
This paper investigates how good the token representations from PLM are for lexical semantics.
Lexical tasks: LSIM: word pair similarity spearman correlation; WA: a+b=c+d; BLI: bilingual word pairs mapping CLIR: cross-lingual retrieval RELP: word pair relation classification, for example, whether "state" is a subset of "country".
They compares the following setups:
includes: monolingual/multilingual; single word encoding/average contextual word encoding across different contexts; encodes special tokens or not; layer average.
They find that mono is better than multi; context is important; special tokens are useless; layer average is effective but not for all tasks.