In many cases, the extracted top MWEs are very uncommon. That's because both the MWE and some or all of their components have a very low frequency leading the PMI to be large.
To Reproduce
Steps to reproduce the behavior:
Simply run MWE extraction and check the results.
Expected behavior
Top MWE results should be common expressions not very rare and unknown.
Examples
whip these ninjas
Possible Solutions
Add a frequency threshold (as a parameter) that defaults to 1. MWE candidates that were observed below this threshold are discarded.
Description
In many cases, the extracted top MWEs are very uncommon. That's because both the MWE and some or all of their components have a very low frequency leading the PMI to be large.
To Reproduce
Steps to reproduce the behavior: Simply run MWE extraction and check the results.
Expected behavior
Top MWE results should be common expressions not very rare and unknown.
Examples
whip these ninjas
Possible Solutions
Add a frequency threshold (as a parameter) that defaults to 1. MWE candidates that were observed below this threshold are discarded.