nok / sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
BSD 3-Clause "New" or "Revised" License
1.28k stars 170 forks source link

Naive Bayes predicting the same label #21

Closed alonsopg closed 6 years ago

alonsopg commented 6 years ago

I am figuring out which is the best machine learning approach to use for a pedometer project. So far I have gyro and accelerometer data of walking and no walking. When I train and test a Naive Bayes model in my machine I get nearly 70 of accuracy. However, when I port to java and add it to my android app and start using the implementation it is just predicting the same label. Several question arise from this: why is this happening?... Do I need to use an online learning algorithm for this scenario?, the balance of my classes is wrong?

nok commented 6 years ago

Hello, did you test the accuracy of the transpiled estimator?

porter = Porter(clf, language='java')
accuracy = porter.predict_test(X)
print(accuracy)
alonsopg commented 6 years ago

Thanks for the help nok, I got this:

 2017-10-25
nok commented 6 years ago

0.0 is the worst case scenario. The question is why did the transpiled estimator return always the same label. Can you post the transpiled output? Did you preprocess the data?

alonsopg commented 6 years ago

The transpiled output?... which is that?. My data is just numbers, the weird thing is that when I train without sklearn porter I get 70% of accuracy when I calculate it with acc_score

nok commented 6 years ago

print(Porter(clf, language='java').export()) prints the transpiled estimator. The method predict_test compares the predictions from the original estimator in Python with the predictions from your target programming language. So the accuracy of your trained estimator doesn't matter.

alonsopg commented 6 years ago

This is the transpiled estimator:

class Brain {

    public static int predict(double[] atts) {
        if (atts.length != 3) {
            return -1;
        }
        int i, j;

        double[] priors = {0.72410256410256413, 0.27589743589743587};
        double[][] sigmas = {{7777.674409056468, 2946.6863403262582, 1470.3790926152326}, {20368.196235029616, 7677.4899510244086, 3792.4066470154612}};
        double[][] thetas = {{2.1022332039660077, -1.1895209475920669, 2.4095972967422088}, {1.1423813271375469, -4.613826715613385, 6.9545192825278832}};
        double[] likelihoods = new double[2];

        for (i = 0; i < 2; i++) {
            double sum = 0.;
            for (j = 0; j < 3; j++) {
                sum += Math.log(2. * Math.PI * sigmas[i][j]);
            }
            double nij = -0.5 * sum;
            sum = 0.;
            for (j = 0; j < 3; j++) {
                sum += Math.pow(atts[j] - thetas[i][j], 2.) / sigmas[i][j];
            }
            nij -= 0.5 * sum;
            likelihoods[i] = Math.log(priors[i]) + nij;
        }

        double highestLikeli = Double.NEGATIVE_INFINITY;
        int classIndex = -1;
        for (i = 0; i < 2; i++) {
            if (likelihoods[i] > highestLikeli) {
                highestLikeli = likelihoods[i];
                classIndex = i;
            }
        }
        return classIndex;
    }

    public static void main(String[] args) {
        if (args.length == 3) {
            double[] atts = new double[args.length];
            for (int i = 0, l = args.length; i < l; i++) {
                atts[i] = Double.parseDouble(args[i]);
            }
            System.out.println(Brain.predict(atts));
        }
    }
}
nok commented 6 years ago

Okay, that looks good. Can you post some samples which you used for the training (.fit())?

alonsopg commented 6 years ago

@nok Here is the data I am using to train the model. I am still having issues in the java side, don't understand why I am always predicting the same label. Here's how I am using the prediction method:

] = rollValue;
    values[1] = pitchValue;
    values[2] = yawValue;
    //values[3] = gxValue;
    //values[4] = gyValue;
    //values[5] = gzValue;

    int y_pred = Brain.predict(values);

    ClassifierLog.setText(Integer.toString(y_pred));

    System.out.println("pred: " + Integer.toString(y_pred));

    int counter = 0;

    //Simple threshold
    if (rollValue < -40 && rollValue > -80
      && pitchValue < 0 && yawValue > 0
      && gxValue < 1000 && gxValue > -2000
      && gyValue < 500 && gyValue > -1000
      && gzValue < 1000 && gzValue > -2000) {
        ClassifierLog.setText("N");                                                 
        System.out.println("COUNTER: " + counter);

    } else {
        counter++;                                         
        ClassifierLog.setText("W");                                               
        System.out.println("COUNTER: " + counter);
    }
}
nok commented 6 years ago

Hello @alonsopg ,

first of all sorry for my late response. Can you share your data again please? And I guess you use the default parameters for the GaussianNB classifier?

Best, Darius