It appears that simply multiplying all the individual probabilities (adding their lns) is a better idea than the somewhat convoluted system of today. Try it, but maintain this issue as a log in case the algorithm ever does need fixing back.
The reason why this can be done is that introducing an irrelevant class (say, boilerplate) only creates very strong wins for all the other classes, so relative scores shift by less than a percent. It does not mess with the corpus.
It appears that simply multiplying all the individual probabilities (adding their lns) is a better idea than the somewhat convoluted system of today. Try it, but maintain this issue as a log in case the algorithm ever does need fixing back.
The reason why this can be done is that introducing an irrelevant class (say, boilerplate) only creates very strong wins for all the other classes, so relative scores shift by less than a percent. It does not mess with the corpus.