In many cases, when the corpus contains misspelled or foreign words and phrases, top MWEs end up being those very rare misspelled expressions. This is a known problem when measuring PMI.
To Reproduce
Steps to reproduce the behavior:
Simply run MWE extraction and check the results.
Expected behavior
Top MWE results should be common expressions consisting of correct words.
Examples
Light Verb Constructions: LOCK THE DOOOOR
Possible Solutions
The proposed solution is to check the components of MWEs against a lexicon of the selected language to ensure they are actual words and not made-up words.
Description
In many cases, when the corpus contains misspelled or foreign words and phrases, top MWEs end up being those very rare misspelled expressions. This is a known problem when measuring PMI.
To Reproduce
Steps to reproduce the behavior: Simply run MWE extraction and check the results.
Expected behavior
Top MWE results should be common expressions consisting of correct words.
Examples
Light Verb Constructions: LOCK THE DOOOOR
Possible Solutions
The proposed solution is to check the components of MWEs against a lexicon of the selected language to ensure they are actual words and not made-up words.