mljs / libsvm

LIBSVM for the browser and nodejs :fire:
https://mljs.github.io/libsvm/
BSD 3-Clause "New" or "Revised" License
82 stars 14 forks source link

The predict function is giving the same values for any test case #14

Closed rguntha closed 6 years ago

rguntha commented 6 years ago

Please help me...

I am working on blood pressure prediction using the SVM regression methods. I have tried both the Epsilon and Nu regression types. I also have tried with various values of gamma, nu, cost and epsilon values.

======================================================= The Problem:

I am getting the same predictions no matter what input data I am giving..

======================================================= Code:

var fs = require("fs");

async function runTest() {
    var trainDataString = fs.readFileSync('../src/assets/bp/trainaims.csv')+"";
    var testDataString = fs.readFileSync('../src/assets/bp/testaims.csv')+"";
    testLibSVM(trainDataString.split('\r\n').splice(1,16),testDataString.split('\r\n').splice(1,5));
}

async function testLibSVM(trainData,testData){
    const SVM = await require('libsvm-js');
    var svmSys = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,gamma:0.01,nu:[0.01,0.125,0.5,1],cost:1,epsilon:0.1});
    var svmDia = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,gamma:0.01,nu:[0.01,0.125,0.5,1],cost:1,epsilon:0.1});
    let dataArray = [];
    let sysValues = [];
    let diaValues = [];

    for(let i=0;i<trainData.length;i++){
      let line = trainData[i];
      let trainingRecord = line.split(",").map(data => parseFloat(data));
      if(trainingRecord.length === 14){
        let inputParams = trainingRecord.splice(0,12);
        dataArray.push(inputParams);
        sysValues.push(trainingRecord[0]);
        diaValues.push(trainingRecord[1]);
      }
    }
    svmSys.train(dataArray,sysValues);
    svmDia.train(dataArray,diaValues);
    testData.forEach(element => {
        if(element.length > 0){
            console.log(element);
            let dataArrayStr = element.split(",");
            let testDataArray = dataArrayStr.map(data => parseFloat(data));
            let values = [];
            values.push(svmSys.predictOne(testDataArray));
            values.push(svmDia.predictOne(testDataArray));
            console.log(values);
        }
    });
  }
runTest().then(() => console.log('done!'));

======================================================= Logs:

trying binaryen method: native-wasm asynchronously preparing wasm binaryen method succeeded. done! optimization finished, #iter = 8 nu = 0.925000 obj = -146.760000, rho = -121.500000 nSV = 16, nBSV = 14 optimization finished, #iter = 8 nu = 1.000000 obj = -97.400000, rho = -68.000000 nSV = 16, nBSV = 16 2,42,84.8456,0.5617,0.1455,0.7072,3.97E+04,536.6846,-143.0407,146.9154,164.2148,130.1103 [ 121.5, 68 ] 2,29,87.7407,0.543,0.1408,0.6838,8.10E+04,1.20E+03,-297.5299,297.771,345.1783,252.506 [ 121.5, 68 ] 1,28,75.1024,0.64,0.1589,0.7989,4.41E+04,552.0304,-133.7681,172.2191,189.3553,156.166 [ 121.5, 68 ] 2,28,77.4648,0.6695,0.1051,0.7745,4.50E+04,869.1954,-137.7776,127.5441,138.6392,117.5474 [ 121.5, 68 ] 1,25,96.8411,0.5049,0.1147,0.6196,1.37E+05,2.39E+03,-540.3023,557.6924,533.8753,582.0348 [ 121.5, 68 ]

======================================================= Input Data

Input data can be found in the attached zip folder. The training data file contains 16 rows. The last two columns are the two label values. The testing file contains 5 rows. From the logs you can see that they are producing same values.

trainaims.zip

stropitek commented 6 years ago

Hi I haven't had the opportunity to test the regression on a real case scenario, only with the demo website's unidimensional example.

Have you tried to do exactly the same with the original libsvm library? Are the results any different? If you could post here the results you get with it that would help a lot!

Thanks

rguntha commented 6 years ago

Hi, Thanks very much for the quick reply..

I have not tried the original libsvm library. I have tried it on R and it gave varying results.

I have even tried with the fat dataset published in libsvm site. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/bodyfat I have used the last 19 records are test records and rest of them as train records. Below is the result. You can note that all the 19 records produced the same '1.05195' as the result.

== * libsvm.js:1 optimization finished, #iter = 0 libsvm.js:1 nu = 0.000000 libsvm.js:1 obj = 0.000000, rho = -1.051950 libsvm.js:1 nSV = 0, nBSV = 0 libsvm.js:1 [[1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195]] fat-libsvm.js:37 done

The code is below. The data files are attached.

var fs = require("fs");

async function runTest() {
    var trainDataString = fs.readFileSync('./src/assets/bp/trainfat.csv')+"";
    var testDataString = fs.readFileSync('./src/assets/bp/testfat.csv')+"";
    testLibSVM(trainDataString.split('\n'),testDataString.split('\n'));
}

async function testLibSVM(trainData,testData){
    const SVM = await require('libsvm-js/asm');
    var svmFat = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,gamma:0.01,nu:[0.01,0.125,0.5,1],cost:1,epsilon:0.1});
    let dataArray = [];
    let fatValues = [];

    for(let i=0;i<trainData.length;i++){
      let line = trainData[i];
      let trainingRecord = line.split(",").map(data => parseFloat(data));
      if(trainingRecord.length === 15){
        let inputParams = trainingRecord.splice(1);
        dataArray.push(inputParams);
        fatValues.push(trainingRecord[0]);
      }
    }
    testArrays = [];
    testData.forEach(element => {
        if(element.length > 0){
            // console.log(element);
            let dataArrayStr = element.split(",");
            let testDataArray = dataArrayStr.map(data => parseFloat(data));
            testArrays.push(testDataArray.splice(1));
        }
    });
    let values = [];
    svmFat.free();
    svmFat.train(dataArray,fatValues);
    values.push(svmFat.predict(testArrays));
    console.log(JSON.stringify(values));
}
runTest().then(() => console.log('done!'));

testfat.zip

rguntha commented 6 years ago

I have tested the same fat files on R also..Below are the commands and the final result..

fatTest<-read.csv("C:\Code\WearableVitals\src/assets/bp/testfat.csv") fat<-read.csv("C:\Code\WearableVitals\src/assets/bp/trainfat.csv") fatFitRadialEsp<-svm(Fat~.,data=fat,type="eps-regression",kernel="radial") predFat<-predict(fatFitRadialEsp,fatTest) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1.042609 1.042513 1.053037 1.043323 1.031655 1.068895 1.033243 1.064847 1.026723 1.032486 1.036302 1.032156 1.064235 1.036778 1.069251 1.033492 1.034420 1.044378 1.038295

stropitek commented 6 years ago

I find it suspicious that the nu parameter is an Array. Can you try with a number instead. Unlike other libraries you cannot grid-search hyperparameters in libsvm-js.

stropitek commented 6 years ago

Actually you are using the epsilon regression so shouldn't matter.

I'll try to have at a look at it soon.

rguntha commented 6 years ago

I have tried without the array, without any parameters and with various options and it has no effect.

I also tried all the types of classifiers and regression types, with each of the type the prediction is different but it is the the same prediction for any input.

Even of I give all 1s as the test input the prediction is still same.

Thanks very much for looking into it.

On Mar 21, 2018 20:44, "Daniel Kostro" notifications@github.com wrote:

I find it suspicious that the nu parameter is an Array. Can you try with a number instead. Unlike other libraries you cannot grid-search hyperparameters in libsvm-js.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mljs/libsvm/issues/14#issuecomment-374973895, or mute the thread https://github.com/notifications/unsubscribe-auth/ADVpBQtM694EGxW-u5-EaOIy0oCywCKlks5tgm5fgaJpZM4SyCoc .

stropitek commented 6 years ago

Hello Your epsilon value is too high. I tried with 0.001 and it gives something that looks similar to your R result. Have look at https://mljs.github.io/libsvm/#/SVR to see how the epsilon value affects the regression.

I'm closing this issue. Feel free to reopen if something still seems wrong.

rguntha commented 6 years ago

Wonderful..Thanks so much. Sorry for my ignorance in these matters. I spent a lot of time learning about these techniques but missed the modification of epsilon part.

Thanks Ramesh

On Fri, Mar 23, 2018 at 12:50 PM, Daniel Kostro notifications@github.com wrote:

Hello Your epsilon value is too high. I tried with 0.001 and it gives something that looks similar to your R result. Have look at https://mljs.github.io/libsvm/#/SVR to see how the epsilon value affects the regression.

I'm closing this issue. Feel free to reopen if something still seems wrong.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mljs/libsvm/issues/14#issuecomment-375565786, or mute the thread https://github.com/notifications/unsubscribe-auth/ADVpBYgSN8cjc1eMeh9RB_X18NVaCmY7ks5thKJagaJpZM4SyCoc .

-- Ramesh Guntha

rguntha commented 6 years ago

Hello, Your suggestion worked well for fat data. But it's not working for my bp data. The results given by the libsvm-js and R are not matching at all. I am using the same parameters. The difference is very great.

I would really appreciate your help. I am not sure if there are any other parameters I am not considering.

R - Commands (Linear Kernal with Esplion=0.001)

bp<-read.csv("C:/Code/WearableVitals/WearableVitalsApp/src/assets/bp/trainaims.csv") bptest<-read.csv("C:/Code/WearableVitals/WearableVitalsApp/src/assets/bp/testaims.csv") traindata<-bp[1:13] testdata<-bptest[1:12] bpFitLinearEsp<-svm(bpsys~.,data=traindata,type="eps-regression",kernel="linear",epsilon=0.001) predBp<-predict(bpFitLinearEsp,testdata) predBp 1 2 3 4 5 149.2531 139.0186 118.4807 136.4002 130.3284

libsvm-js (Linear Kernal with Esplion=0.001)

epsilon = 0.001;
            let values = [];
            var svmSys = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,epsilon:epsilon,kernel:SVM.KERNEL_TYPES.LINEAR});
            var svmDia = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,epsilon:epsilon,kernel:SVM.KERNEL_TYPES.LINEAR});
            svmSys.free();
            svmSys.train(dataArray,sysValues);
            values.push(svmSys.predict(testArrays).map(x => math.round(x)));
            successes.push(JSON.stringify(values));
    console.log("Successes:"+successes.join("\n"));

Results: optimization finished, #iter = 10000000 nu = 0.875000 obj = -2410.522562, rho = 200.890164 nSV = 16, nBSV = 10 Successes:[[543,-624,539,-32,-2012]]

R - Commands (Linear Kernal with Esplion=1.5)

bpFitLinearEsp<-svm(bpsys~.,data=traindata,type="eps-regression",kernel="linear",epsilon=1.5) predBp<-predict(bpFitLinearEsp,testdata) predBp 1 2 3 4 5 122.1728 122.8121 119.0912 119.9891 125.0957

libsvm-js (Linear Kernal with Esplion=1.5)

Results: optimization finished, #iter = 10000000 nu = 0.228737 obj = -48.633458, rho = 16.987639 nSV = 9, nBSV = 1 Successes:[[379,648,415,-94,1244]]

stropitek commented 6 years ago

@rguntha I compared the output of the epsilon-SVR in libsvm-js with the output from the original library (C implementation) and it's exactly the same when the parameters are all explicitely set. I noticed however a bug in how the default value for the gamma parameter was chosen. It is supposed to be 1/num_features but actually was hardcoded to 0.1. I fixed that and released a new version of libsvm-js.

Also note that according to the libsvm website, the R package is based on version 3.17 whereas libsvm-js is based on 3.22, so output may slightly differ.

Hope that will fix your issue. I'm closing again, feel free to reopen if you still have issues.

rguntha commented 6 years ago

@stropitek I have taken your latest version 0.2.0 and retried the above test cases (Linear Kernal with epsilon 0.001 and 1.5), but unfortunately the results are same and very much different from R results as mentioned above.

Please note that libsvm takes a very long time for these computations, may because the very large number of iterations (10 million, as mention in the results in previous comment).

It would be great if you can rerun my test files attached earlier in the thread (trainaims.csv and testaims.csv)

Thanks very much for your continued help.

stropitek commented 6 years ago

Looking in the R documentation, I read:

Per default, data are scaled internally (both x and y variables) to zero mean and unit variance

Indeed SVM does not work well if the data is not scaled. In libsvm-js, data is not scaled by default, you have to do it yourself.

rguntha commented 6 years ago

@stropitek Thanks for the scaling tip. Now I have implemented the scaling using the formula scaledX = (x-mean(featureVector))/std-dev(featureVector).

The results are exactly matching with R.

Thanks very much for your help