issues
search
koaning
/
tokenwiser
Bag of, not words, but tricks!
https://koaning.github.io/tokenwiser/
Apache License 2.0
68
stars
7
forks
source link
Methods of re-weighing embeddings based on tokens.
#8
Closed
koaning
closed
3 years ago
koaning
commented
3 years ago
[ ] filter the text beforehand using stopword lists (sklearn, spacy)
[ ] reweigh based on probability
[ ] reweigh based on token length
[ ] reweigh based on tf/idf
[ ] reweigh based on number of sylables
[ ] reweigh based on islands method (this is described
here
, it favors "important" words that are surrounded by "unimportant words")