Closed shahrin014 closed 2 years ago
Getting this error for a relatively large dataset. For a dataset of about 10 records, 13 odd features, it works. Anything larger than that, the program fails with the error above.
@saeedshahab ... errr ... so should we fix it?
I tried hacking into the codebase to identify what could be going wrong. Unfortunately I wasn't able to trace any issues. I'll try debugging again to identify what could be going wrong.
I ran into the same issue on a larger dataset
I tracked this down a little. I experience this error on my development machine, but strangely, not on my application server. I tried reducing my dimensionality as a test but that didn't seem to solve the problem.
It seems like the bestSplit
function is not working as intended on some data sets. I cannot seem to figure out why. It seems that some data makes a split put all of the data in either the greater
or the lesser
bucket. Changing the bestSplit
function to the following seems to move the error up the stack:
split(x, y, splitValue) {
var lesser = [];
var greater = [];
for (var i = 0; i < x.length; ++i) {
if (x[i] < splitValue) {
lesser.push(y[i]);
} else if (x[i] > splitValue) {
greater.push(y[i]);
} else {
throw new TypeError('cannot split!! equal!!!')
}
}
return {
greater: greater,
lesser: lesser
};
}
I'll try to find some time to post a code snippet that can demonstrate the issue.
@jondwillis I did some more digging, it seems like it's not dependent on how large the dataset is, but how many features you're modeling.
I found with < 20 features it works, once I hit 20 inputs that's when I get that error.
Thanks, Yaw
@yawetse Thanks for the tip! I may be able to reduce my features. It is still strange that it works in one environment but not in another.
edit) Actually, I just hit this error with a 40 row, 8 feature matrix.
My training options are:
const options = {
seed: 42,
maxFeatures: 1.0,
replacement: true,
nEstimators: 20,
selectionMethod: "median",
useSampleBagging: true
}
Setting maxFeatures to less than 20 does not appear to help, nor does turning off replacement or sample bagging.
One dataset that causes the error is as following: (had to screengrab this from a remote server, clipboard isn't working)
@shahrin-14 @saeedshahab @yawetse
I forked a solution for the problem that I was experiencing. Not totally sure that it doesn't have unintended side-effects. Basically, if there are no elements in the greater/lesser buckets (due to all elements being either greater or lesser than a given split value), it treats that bucket as having zero error during training.
https://github.com/jondwillis/random-forest and https://github.com/jondwillis/decision-tree-cart
@targos perhaps you should have a look at this.
An easy fix would be
for (var j = 0; j < splitValues.length; ++j) {
var currentSplitVal = splitValues[j];
var min_currentFeature =ML.ArrayStat.min(currentFeature)
var splitted = this.split(currentFeature, y, currentSplitVal);
if(min_currentFeature === currentSplitVal){
var gain = Infinity
}else{
var gain = gainFunctions[this.gainFunction](y, splitted);
}
if (check(gain, bestGain)) {
maxColumn = i;
maxValue = currentSplitVal;
bestGain = gain;
}
}
in the ml.js
file.
Ran into this on 2 feature, 10k row problem :/
The reason lesser
or greater
has zero elements is because there are duplicate values that are equal to each other. (I'm guessing it's if the largest or smallest value in a group of 4 values is repeated).
An easy fix would be
for (var j = 0; j < splitValues.length; ++j) { var currentSplitVal = splitValues[j]; var min_currentFeature =ML.ArrayStat.min(currentFeature) var splitted = this.split(currentFeature, y, currentSplitVal); if(min_currentFeature === currentSplitVal){ var gain = Infinity }else{ var gain = gainFunctions[this.gainFunction](y, splitted); } if (check(gain, bestGain)) { maxColumn = i; maxValue = currentSplitVal; bestGain = gain; } }
in the
ml.js
file.
taking up this idea, this worked for me:
for (let j = 0; j < splitValues.length; ++j) {
let currentSplitVal = splitValues[j];
var min_currentFeature = Array$1.min(currentFeature);
let splitted = this.split(currentFeature, y, currentSplitVal);
var gain = Infinity;
if(min_currentFeature !== currentSplitVal) {
gain = gainFunctions[this.gainFunction](y, splitted);
}
if (check(gain, bestGain)) {
maxColumn = i;
maxValue = currentSplitVal;
bestGain = gain;
}
}
Where do you put that code?
closed by 219087f3a5273d4bf6e8ed89d15d97681826c7fb
Before closing this issue please tell where did you put that code?
Hi there.
My use case is to get a movie's genre, and predict the rating that would be given. Since genre are discrete values I considered using Naive Bayes. However since I need to predict the movie rating given, I read that Random Forest can get me the desired result.
I have the following training set which is arrays of inverse document frequencies as follows. (VJrxxZeJeWDr:131)
at Object.invoke (angular.js:5040)
at $controllerInit (angular.js:11000)
var genreList = ["Biography","Drama","History","Documentary","Action","Comedy","Thriller","Crime","Music","Family","Fantasy","Musical","Animation","Adventure","Sport","Horror","Mystery","Sci-Fi"]
var trainingset = [ [0.1111111111111111,0.05555555555555555,0.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0.1111111111111111,0,0,0,0.041666666666666664,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0],[0.1111111111111111,0.05555555555555555,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0.1,1,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0.3333333333333333,0.25,1,0,0,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0.05555555555555555,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0.041666666666666664,0.041666666666666664,0,0.1,0,0,0,0,0,0,0,0,0,0],[0,0.05555555555555555,0,0,0,0,0,0,0,0.3333333333333333,0.25,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0.1111111111111111,0.05555555555555555,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0],[0,0,0,0,0.041666666666666664,0.041666666666666664,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0],[0,0.05555555555555555,0,0,0,0.041666666666666664,0,0.1,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0.25,0,0,0.034482758620689655,0,0,0,0],[0,0.05555555555555555,0,0,0,0.041666666666666664,0,0.1,0,0,0,0,0,0,0,0,0,0],[0,0,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0.1111111111111111,0.05555555555555555,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0.5,0.3333333333333333,0],[0,0.05555555555555555,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0.1111111111111111,0.05555555555555555,0.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0,0,0.25,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0.041666666666666664,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0],[0,0,0,0,0.041666666666666664,0.041666666666666664,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0],[0,0.05555555555555555,0,0,0,0,0,0.1,0,0,0,0,0,0,0,0,0.3333333333333333,0],[0,0,0,0,0.041666666666666664,0.041666666666666664,0,0.1,0,0,0,0,0,0,0,0,0,0],[0,0.05555555555555555,0,0,0,0,0,0.1,0,0,0,0,0,0,0,0,0.3333333333333333,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0.05555555555555555,0.2,0,0,0,0,0.1,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0.25,0,0,0.034482758620689655,0,0,0,0],[0.1111111111111111,0.05555555555555555,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0.07692307692307693],[0,0,0,0,0,0.041666666666666664,0,0,0,0.3333333333333333,0,0,0.07692307692307693,0,0,0,0,0],[0,0,0,0.125,0,0,0,0.1,0,0,0,0,0,0,0,0,0,0],[0,0.05555555555555555,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0.5,0,0],[0,0,0.2,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0.1,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0.05555555555555555,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0],[0.1111111111111111,0.05555555555555555,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0.1111111111111111,0,0,0.125,0,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0,0.07692307692307693],[0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0,0.07692307692307693],[0,0,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0.05555555555555555,0.2,0,0,0,0,0,0,0,0,0,0,0.034482758620689655,0,0,0,0],[0,0,0,0,0.041666666666666664,0.041666666666666664,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0],[0,0,0,0,0,0.041666666666666664,0,0,0,0,0,0,0.07692307692307693,0.034482758620689655,0,0,0,0] ]
var predictions = [7,10,8,9,7,3,7,7,10,7,5,6,7,9,8,7,7,7,9,8,7,6,8,8,10,8,7,5,5,8,6,5,6,8,8,2,6,8,7,6,6,5,9,6,6,10,7,7,6,6,10,8,9,7,8,6,8,9,9,7,6,9,7,6,7,7]
However I get the following console error: Error: input must not be empty at mean (index.js:12) at squaredError (utils.js:82) at Object.regressionError [as regression] (utils.js:106) at TreeNode.bestSplit (TreeNode.js:57) at TreeNode.train (TreeNode.js:157) at DecisionTreeRegression.train (DecisionTreeRegression.js:43) at RandomForestRegression.train (RandomForestBase.js:95) at Object.