[x] Remove uncommon tags with low number of occurrences (e.g. 1000, but depending on the size of the dataset and domain)
[ ] Aggregate tags that refer to the same term: 1) group tags that differ only by special characters, e.g. "-", " ", etc. 2) group morphologically similar tags (e.g. singular and plural terms, different verb tenses).
[x] Remove irrelevant (e.g. HD Porn, etc) tags.
[ ] Use Porter's algorithm, lemmatization or stemming
The following strategies can be used: