Classifier evaluation and validation

ibnesayeed commented 7 years ago

It is still a work in progress as per #71...

Ch4s3 commented 7 years ago

looks pretty good so far

ibnesayeed commented 7 years ago

Now we are generating k-fold validation accuracy report that looks like this:

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       486        14   0.97200
  2       500       500         0   1.00000
  3       500       499         1   0.99800
  4       500       495         5   0.99000
  5       500       500         0   1.00000
  6       500       499         1   0.99800
  7       500       498         2   0.99600
  8       500       499         1   0.99800
  9       500       498         2   0.99600
 10       500       498         2   0.99600
-------------------------------------------
All      5000      4972        28   0.99440

ibnesayeed commented 7 years ago

Now printing confusion matrix over the total sample set along with the accuracy stats of accumulated and individual runs or k-fold validation. The code is capable enough to print confusion matrix of individual runs, but that would be overwhelming output. Many methods are defined in a way that they can produce meaningful output for one or more instances of conf_mat objects passed.

The code is written with multi-class analysis in mind (not just binary). That's why we are only printing the confusion matrix, but not the confusion table (the one that has TP/TN/FP/FN stats) as it would require the classes to be binary and some way to tell which class is considered positive. Perhaps we can provide a parameter so that user can tell the name of the positive class in binary classes then we can conditionally generate more statistics on the data. However, we can still calculate precision and recall for each class without any supplementary information (that would be my next task).

$ rake validate

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       486        14   0.97200
  2       500       499         1   0.99800
  3       500       499         1   0.99800
  4       500       498         2   0.99600
  5       500       496         4   0.99200
  6       500       499         1   0.99800
  7       500       500         0   1.00000
  8       500       499         1   0.99800
  9       500       497         3   0.99400
 10       500       499         1   0.99800
-------------------------------------------
All      5000      4972        28   0.99440

---------------- Confusion Matrix -----------------
Predicted ->          Ham         Spam        Total
---------------------------------------------------
Ham                  4307           20         4327
Spam                    8          665          673
---------------------------------------------------
Total                4315          685         5000

ibnesayeed commented 7 years ago

I think I have got an idea, we can report stats for each class as the positive class. This will be one versus all situation repeated for all classes.

Ch4s3 commented 7 years ago

I'll defer to your judgement here, as this is a bit out of my wheelhouse.

ibnesayeed commented 7 years ago

Now reporting confusion matrix with various derived stats for each class treated as positive class one at a time. The code is refactored in a way that it can be reused if one knows the positive class and wants to generate reports only for that.

$ rake validate

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       485        15   0.97000
  2       500       497         3   0.99400
  3       500       497         3   0.99400
  4       500       497         3   0.99400
  5       500       499         1   0.99800
  6       500       497         3   0.99400
  7       500       500         0   1.00000
  8       500       499         1   0.99800
  9       500       499         1   0.99800
 10       500       498         2   0.99600
-------------------------------------------
All      5000      4968        32   0.99360

---------------- Confusion Matrix -----------------
Predicted ->          Ham         Spam        Total
---------------------------------------------------
Ham                  4305           22         4327
Spam                   10          663          673
---------------------------------------------------
Total                4315          685         5000

# Positive class: Ham
Total population   : 5000
Condition positive : 4327
Condition negative : 673
True positive      : 4305
True negative      : 663
False positive     : 10
False negative     : 22
Prevalence         : 0.8654
Specificity        : 0.9851411589895989
Recall             : 0.9949156459440721
Precision          : 0.9976825028968713
Accuracy           : 0.9936
F1 score           : 0.9962971534367044

# Positive class: Spam
Total population   : 5000
Condition positive : 673
Condition negative : 4327
True positive      : 663
True negative      : 4305
False positive     : 22
False negative     : 10
Prevalence         : 0.1346
Specificity        : 0.9949156459440721
Recall             : 0.9851411589895989
Precision          : 0.9678832116788321
Accuracy           : 0.9936
F1 score           : 0.9764359351988218

ibnesayeed commented 7 years ago

----------------------- Confusion Matrix ----------
Predicted ->          Ham         Spam        Total
---------------------------------------------------
Ham                  4307           20         4327
Spam                    6          667          673
---------------------------------------------------
Total                4313          687         5000

Confusion matrix now also reports class-wise precision and recall in last row and last column respectively. Although, not tested yet, but all the functionalities implemented so far should work in multi-class datasets equally well.

----------------------- Confusion Matrix -----------------------
Predicted ->          Ham         Spam        Total       Recall
----------------------------------------------------------------
Ham                  4307           20         4327      0.99538
Spam                    6          667          673      0.99108
----------------------------------------------------------------
Total                4313          687         5000
Precision         0.99861      0.97089

ibnesayeed commented 7 years ago

This is what a typical validation task result now looks like.

$ rake validate
/usr/local/bin/ruby -w -I"lib:lib" -I"/usr/local/bundle/gems/rake-12.0.0/lib" "/usr/local/bundle/gems/rake-12.0.0/lib/rake/rake_test_loader.rb" "test/validators/classifier_validation.rb" 

# ClassifierValidation

===================== lsi_classifier_5_fold_cross_validate =====================
TODO: LSI is not validatable until all of the [:train, :classify, :categories] methods are implemented!
--------------------------------------------------------------------------------

================ bayes_classifier_10_fold_cross_validate_memory ================

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       484        16   0.96800
  2       500       489        11   0.97800
  3       500       490        10   0.98000
  4       500       484        16   0.96800
  5       500       487        13   0.97400
  6       500       489        11   0.97800
  7       500       489        11   0.97800
  8       500       488        12   0.97600
  9       500       488        12   0.97600
 10       500       491         9   0.98200
-------------------------------------------
All      5000      4879       121   0.97580

----------------------- Confusion Matrix -----------------------
Predicted ->          Ham         Spam        Total       Recall
----------------------------------------------------------------
Ham                  4230           97         4327      0.97758
Spam                   24          649          673      0.96434
----------------------------------------------------------------
Total                4254          746         5000
Precision         0.99436      0.86997

# Positive class: Ham
Total population   : 5000
Condition positive : 4327
Condition negative : 673
True positive      : 4230
True negative      : 649
False positive     : 24
False negative     : 97
Prevalence         : 0.8654
Specificity        : 0.9643387815750372
Recall             : 0.9775826207534088
Precision          : 0.9943582510578279
Accuracy           : 0.9758
F1 score           : 0.9858990793613798

# Positive class: Spam
Total population   : 5000
Condition positive : 673
Condition negative : 4327
True positive      : 649
True negative      : 4230
False positive     : 97
False negative     : 24
Prevalence         : 0.1346
Specificity        : 0.9775826207534088
Recall             : 0.9643387815750372
Precision          : 0.8699731903485255
Accuracy           : 0.9758
F1 score           : 0.9147286821705426

--------------------------------------------------------------------------------

================= bayes_classifier_3_fold_cross_validate_redis =================

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1      1666      1630        36   0.97839
  2      1666      1622        44   0.97359
  3      1666      1611        55   0.96699
-------------------------------------------
All      4998      4863       135   0.97299

----------------------- Confusion Matrix -----------------------
Predicted ->          Ham         Spam        Total       Recall
----------------------------------------------------------------
Ham                  4212          113         4325      0.97387
Spam                   22          651          673      0.96731
----------------------------------------------------------------
Total                4234          764         4998
Precision          0.9948      0.85209

# Positive class: Ham
Total population   : 4998
Condition positive : 4325
Condition negative : 673
True positive      : 4212
True negative      : 651
False positive     : 22
False negative     : 113
Prevalence         : 0.8653461384553821
Specificity        : 0.9673105497771174
Recall             : 0.9738728323699422
Precision          : 0.9948039678790742
Accuracy           : 0.9729891956782714
F1 score           : 0.9842271293375394

# Positive class: Spam
Total population   : 4998
Condition positive : 673
Condition negative : 4325
True positive      : 651
True negative      : 4212
False positive     : 113
False negative     : 22
Prevalence         : 0.13465386154461784
Specificity        : 0.9738728323699422
Recall             : 0.9673105497771174
Precision          : 0.8520942408376964
Accuracy           : 0.9729891956782714
F1 score           : 0.906054279749478

--------------------------------------------------------------------------------

Finished in 24.51805s

ibnesayeed commented 7 years ago

I feel it is quite full-featured now. We still need some unit tests for individual methods of the module, RDoc, and user documentation, but those can be handled in a separate PR. @Ch4s3 please feel free to merge it.

@marciovicente, Could you please have a look at the reports in the last message and see if anything important is missing or wrong?

ibnesayeed commented 7 years ago

@Ch4s3 I consider this one done from my side. I have added exhaustive user documentation (#145), hence RDoc is less important for this one, though we can add that in a separate PR. Unit tests will also be added separately as this has already become a big pile of commits and file changes.

marciovicente commented 7 years ago

@ibnesayeed It's a nice report! Seems like a Weka output 👏 Looks awesome to me! ✅

jekyll / classifier-reborn

Classifier evaluation and validation #142