The pair explainer contains too much information

xdvom03 / klaus

Bayesian text classification of websites in a nested class system

Creative Commons Zero v1.0 Universal

2 stars 0 forks source link

Sites often have ~100 keywords. This may be too much anyway, putting focus on word count over strength (as one group will devolve into random words). Choosing words more carefully (#73, #74) is the way forward (For example: this should end up as food, not physics. Maybe we need word pairs here. In that case, we must recognize the word pair as a single token.).

However, it's very hard to make any educated guesses here because the interface is too debug-y. We don't usually need the word counts, just the score, and listing the words might still be too lengthy. Maybe a graph would be better, with individual words visible on mouseover or by toggle. This would show whether evidence is at the beginning or end of the distribution.

xdvom03 / klaus

The pair explainer contains too much information #109