sethjuarez / numl

Machine Learning for .NET
http://numl.net
MIT License
430 stars 104 forks source link

Please add support for decimal properties #13

Closed DashNY closed 9 years ago

DashNY commented 9 years ago

Hi there,

My model consists of decimal properties labeled as [Feature] and [Label], and I'm getting the following exception in Jest.cs. It appears DoubleConverter.CanConvertTo(typeof(decimal)) returns false causing this exception.

As a workaround I've switched all properties to Double, but I'm wondering if anything can be done about it.

System.InvalidCastException was unhandled by user code HResult=-2147467262 Message=Cannot convert 20 to Decimal Source=numl StackTrace: at numl.Utils.Ject.Convert(Double val, Type t) in z:\Builds\work\6fc28cb662d1e0f0\numl\Utils\Ject.cs:line 287 at numl.Model.Property.Convert(Double val) in z:\Builds\work\6fc28cb662d1e0f0\numl\Model\Property.cs:line 79 at numl.Supervised.DecisionTree.DecisionTreeGenerator.BuildLeafNode(Double val) in z:\Builds\work\6fc28cb662d1e0f0\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 243 at numl.Supervised.DecisionTree.DecisionTreeGenerator.BuildTree(Matrix x, Vector y, Int32 depth, List1 used) in z:\Builds\work\6fc28cb662d1e0f0\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 172 at numl.Supervised.DecisionTree.DecisionTreeGenerator.Generate(Matrix x, Vector y) in z:\Builds\work\6fc28cb662d1e0f0\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 91 at numl.Learner.GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable1 examples, Double trainingPct) in z:\Builds\work\6fc28cb662d1e0f0\numl\Learner.cs:line 143 at numl.Learner.<>cDisplayClasse.bd(Int32 i) in z:\Builds\work\6fc28cb662d1e0f0\numl\Learner.cs:line 110 at System.Threading.Tasks.Parallel.<>cDisplayClassf`1.bc() InnerException:

sethjuarez commented 9 years ago

Uh oh, can you post the offending row (or object data) + types? -- EDIT OK, I understand (having read through it again). I will check the code.

DashNY commented 9 years ago

Sure thing. It appears to happen only on properties marked with [Label]. Here's your modified Tennis class.

public class Tennis
{
    [Feature]
    public Outlook Outlook { get; set; }
    [Feature]
    public Temperature Temperature { get; set; }
    [Feature]
    public bool Windy { get; set; }
    //[Label]
    public bool Play { get; set; }

    //[Feature]
    [Label]
    public decimal TestDecimal { get; set; }

    public static Tennis[] GetData()
    {
        return new[]
        {
            new Tennis {Play = true, Outlook = Outlook.Sunny, Temperature = Temperature.Low, Windy = true, TestDecimal = 2},
            new Tennis {Play = false, Outlook = Outlook.Sunny, Temperature = Temperature.High, Windy = true, TestDecimal = 3},
            new Tennis {Play = false, Outlook = Outlook.Sunny, Temperature = Temperature.High, Windy = false, TestDecimal = 2},
            new Tennis {Play = true, Outlook = Outlook.Overcast, Temperature = Temperature.Low, Windy = true, TestDecimal = 4},
            new Tennis {Play = true, Outlook = Outlook.Overcast, Temperature = Temperature.High, Windy = false, TestDecimal = 2},
            new Tennis {Play = true, Outlook = Outlook.Overcast, Temperature = Temperature.Low, Windy = false, TestDecimal = 2},
            new Tennis {Play = false, Outlook = Outlook.Rainy, Temperature = Temperature.Low, Windy = true, TestDecimal = 5},
            new Tennis {Play = true, Outlook = Outlook.Rainy, Temperature = Temperature.Low, Windy = false, TestDecimal = 2}
        };
    }
}
sethjuarez commented 9 years ago

Yes, once I read the whole thing over I immediately understood. It is an easy fix. Looks like I forgot a unit test. Will add it and the fix shortly. Thanks for calling it out!

sethjuarez commented 9 years ago

I wanted to make sure to clarify something about this fix. The reason why this was never caught previously is because I have not really focused on regression yet for this library. If the label is a continuous value, I'm not sure that a DT (or any of the other classification algorithms) will work correctly.

DashNY commented 9 years ago

Thanks for the quick fix, Seth. This is a truly great project.

DashNY commented 9 years ago

Seth, do you mean to say that we should generally use Boolean values as the output of predictions with numl?

sethjuarez commented 9 years ago

Not necessarily. Most of the models (exception Perceptron and KPerceptron) will handle the multi-class case (if I remember right). If you are looking for a continuous output then you can use regression (I think I implemented a linear regression but have not tested it yet). It isn't really a numl limitation as it isn't too hard to add other regression algorithms (like logistic regression et al).

normanhh3 commented 9 years ago

@DashNY you guys make the Dash App by any chance?

DashNY commented 9 years ago

Nope, not me.