Open CodingDoug opened 4 months ago
Hi Doug, Thanks for the details.
The formatting issue in the confusion matrix printed in the training logs was fixed. The fix will be included in a next release (the next one of the one after).
Note that model.evaluate
computes a non-weighted evaluation. A next release introduces a "weighted" argument to model.evaluate
to enable weighted evaluations.
After some exploration in this case, it seems the model prediction and evaluation (e.g. programmatic access) is correct (i.e., it is only a display issue).
If I run predictions against the model using the same training dataset, I compute only 186 of 15405 incorrect predictions (1.2%).
This is possible.
If a training dataset is small, the model self evaluation will be noisy. Having example weights (both for training or evaluation) further increase this noise.
If you use gradient boosted trees (GBT), the self evaluation is computed using the validation dataset (which it extracted from the training dataset if not provided). So, if the training dataset is small, the validation dataset is also small an a discrepancy between self evaluation and evaluation on a test set is expected.
If you use Random Forests , the self evaluation is computed using an out-of-bag evaluation. The out-of-bag evaluation is a conservative estimate of the model quality. If the dataset is small, this estimate can be poor. In addition, if the model contains a small amount of trees, the out-of-bag evaluation can be biased (in the conservative direction). Note that in this case, the winner_take_all
Random Forest learner hyper-parameter can help--but using more trees is generally preferable.
I'm using RandomForestLearner to train a 10-class categorization model using roughly 15000 examples and 12 features. My example set is imbalanced in terms of category distribution, so I need to use class-based weighting to boost the under-represented classes.
I'm post-processing my dataset with weights computed from the entire set:
The resulting model is effective, but the confusion matrix is confusing. Here part of the output from
model.describe()
:Basically unreadable. Here it is again from
model.self_evaluation()
:Without weights, the confusion matrix prints integers, as I would expect. With weights, it's these floating point numbers that don't make much sense. Also I believe the accuracy number is incorrect. If I run predictions against the model using the same training dataset, I compute only 186 of 15405 incorrect predictions (1.2%).