xdvom03 / klaus

Bayesian text classification of websites in a nested class system
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

Fix PageRank #47

Closed xdvom03 closed 3 years ago

xdvom03 commented 3 years ago

It appears that simply multiplying all the individual probabilities (adding their lns) is a better idea than the somewhat convoluted system of today. Try it, but maintain this issue as a log in case the algorithm ever does need fixing back.

The reason why this can be done is that introducing an irrelevant class (say, boilerplate) only creates very strong wins for all the other classes, so relative scores shift by less than a percent. It does not mess with the corpus.

xdvom03 commented 3 years ago

Solved as per commit