While taking a more thorough look into the Linear Regression implementation, I'm seeing that Accuracy tends to report as 0%. Here is the code that is currently being used in (dev branch) Learner.cs:
// testing
object[] test = GetTestExamples(testingSlice, examples);
double accuracy = 0;
for (int j = 0; j < test.Length; j++)
{
// items under test
object o = test[j];
// get truth
var truth = Ject.Get(o, descriptor.Label.Name);
// if truth is a string, sanitize
if (descriptor.Label.Type == typeof(string))
truth = StringHelpers.Sanitize(truth.ToString());
// make prediction
var features = descriptor.Convert(o, false).ToVector();
var p = model.Predict(features);
var pred = descriptor.Label.Convert(p);
// assess accuracy
if (truth.Equals(pred))
accuracy += 1;
}
// get percentage correct
accuracy /= test.Length;
Then this is consumed later in Learner.Best:
var q = from m in models
where m.Accuracy == (models.Select(s => s.Accuracy).Max())
select m;
return q.FirstOrDefault();
So basically, it iterates through the training slice, makes the prediction, and then assesses the success of the prediction against the truth. But currently, it only has one implementation of assessment: truth.Equals(pred). This then is consumed in the Learner.Best() being getting the one with the highest (max) value of Accuracy.
This approach means that unless two doubles are exactly equal (not likely except for possibly trivial data) that LinearRegression will always produce 0% Accuracy.
I wanted to abstract this out, but I wanted to get thoughts on how to approach this, as there are a lot of possible routes forward.
We could...
Pass in more parameters.
With or without creating overloads for convenience.
Create a simple enum and pass this in as a single parameter.
Avoids getting ridiculous with just shoving parameters.
Create some kind of TestOption object/hierarchy and pass this in.
Current implementation would be a descendant like TruthEqualsPredictionTestOption.
This would also be the default to avoid breaking changes.
Change Learner singleton implementation from static class to a singleton instance, in which case we could subclass Learner with overrides for different methods.
I personally waver between the TestOption approach and the Learner changes. Each has its pros and cons.
With the TestOption approach, we can easily keep from having breaking changes. But we would then have to change the Learner.Best() method depending on what the options instance is, and we end up with a switch statement, or worse, an if-then-else chain.
With the Learner singleton changes, we could more cleanly address the various capabilities of the Learner class. But this would probably entail breaking changes. I could actually write an ILearnerThing interface that has a default implementation that uses the current static class as-is, and this would avoid breaking changes. However, going forward, we would have a fragmented approach to using the library. Also, this would possibly (probably?) incorporate using DI of some sort which brings along with it more design decisions, i.e. complexity.
So, those are my thoughts. The goal is simply to get some accuracy with LinearRegression and do it in such a way that if we get a good statistician personage (or maybe one of you already is), it gives them easy access to a more robust assessment of accuracy without getting too YAGNI.
While taking a more thorough look into the Linear Regression implementation, I'm seeing that Accuracy tends to report as 0%. Here is the code that is currently being used in (dev branch) Learner.cs:
Then this is consumed later in
Learner.Best
:So basically, it iterates through the training slice, makes the prediction, and then assesses the success of the prediction against the truth. But currently, it only has one implementation of assessment:
truth.Equals(pred)
. This then is consumed in theLearner.Best()
being getting the one with the highest (max) value ofAccuracy
.This approach means that unless two doubles are exactly equal (not likely except for possibly trivial data) that LinearRegression will always produce 0% Accuracy.
I wanted to abstract this out, but I wanted to get thoughts on how to approach this, as there are a lot of possible routes forward.
We could...
TestOption
object/hierarchy and pass this in.TruthEqualsPredictionTestOption
.Learner
singleton implementation from static class to a singleton instance, in which case we could subclassLearner
with overrides for different methods.I personally waver between the
TestOption
approach and theLearner
changes. Each has its pros and cons.With the
TestOption
approach, we can easily keep from having breaking changes. But we would then have to change theLearner.Best()
method depending on what the options instance is, and we end up with a switch statement, or worse, an if-then-else chain.With the
Learner
singleton changes, we could more cleanly address the various capabilities of theLearner
class. But this would probably entail breaking changes. I could actually write anILearnerThing
interface that has a default implementation that uses the current static class as-is, and this would avoid breaking changes. However, going forward, we would have a fragmented approach to using the library. Also, this would possibly (probably?) incorporate using DI of some sort which brings along with it more design decisions, i.e. complexity.So, those are my thoughts. The goal is simply to get some accuracy with LinearRegression and do it in such a way that if we get a good statistician personage (or maybe one of you already is), it gives them easy access to a more robust assessment of accuracy without getting too YAGNI.