sethjuarez / numl

Machine Learning for .NET
http://numl.net
MIT License
430 stars 104 forks source link

Corrected LearnerPrediction and Prediction tests for Supervised Tests… #79

Closed rold2007 closed 6 years ago

rold2007 commented 6 years ago

…. They were failing once in a while.

Corrected exception when the test samples don't contain all potential labels. Corrected typos.

rold2007 commented 6 years ago

I was trying to fix an issue in NaiveBayesGenerator and I noticed that some tests were failing once in a while. It made it hard to figure out if it was my changes or something else so I decided to fix the fails separately.

Here is a longer description of the changes I made. Note that there were many ways to prevent the fails. Let me know if you would have preferred another approach.

LearnerPrediction and Prediction fails: This was mainly caused by the fact that the test datasets are quite small. Since the train/test samples are selected randomly we were sometimes unlucky and the accuracy was not satisfying. Considering that by using a 0.8 ratio of train vs test samples on a 14 samples dataset, this leaves only 3 test samples. The current minimum accuracy was 0.75 which would not allow even one fail. I reduced this to 0.66 and increased the repeat from 10 to 100. Another recurring fail was that the best model couldn't always predict the hardcoded sample for Iris even when the model accuracy was high. Looking at the scatterplot for Iris I noticed that SepalLength and SepalWidth were a little bit off from what we would expect of Iris-Setosa so I adjusted these values.

Index out of range exception: The GenerateModel() method in Learner.cs was extracting the potential labels from the test set. In a context of low test samples it often happened that the test set didn't even include all the trained labels. When the model would (wrongfully) predict a label which was not in the test set it would lead to an invalid descriptor label index. I added a parameter to prevent updating the labels list. Another way to repair this could have been to clone the descriptor from the train set but I found no clone method for Descriptor.

Note that some numl tests are still failing but they are unrelated to my changes. I might give them a look once I fixed NaiveBayesGenerator...

rold2007 commented 6 years ago

Hey @sethjuarez , I would like to submit another pull request but it is dependent on this one. Is it possible for you to merge it ? Is there any improvement you would need me to do first ?

Thanks