Closed xdvom03 closed 4 years ago
Unknown words get completely arbitrary 0.4 (which is basically never applied because we look for interestingness). I don't do that, unknown words get 0.5 (in pair fights). Obviously, no word interdependences.
It seems that that's it. Should have done this sooner.
Dividing by document count, not word count, was also a mistake, but wasn't found. :(
http://paulgraham.com/spam.html
There are at least these questionable parts: