maria-antoniak / little-mallet-wrapper

A Python wrapper around the topic modeling functions of MALLET.
GNU General Public License v3.0
100 stars 17 forks source link

Short words always removed #13

Open maria-antoniak opened 1 month ago

maria-antoniak commented 1 month ago

Short words (1-2 characters) are removed by MALLET by default via its regex tokenizing function.