Closed hazemsoliman115 closed 5 years ago
Another example where the features are chosen to be in different values range: Data: [1.0, 20.0, 200.5, 0.002 2.3, 50.0, 300.75, 0.009 1.3, 20.4, 100.9, 0.0045 10.3, 200.4, 1000.9, 10.45]
and the resulting trees are: tree[0] featureIndex: 1 featureValue: 9.371498894961741
tree[1] featureIndex: 0 featureValue: 199.07323144154924
tree[2] featureIndex: 0 featureValue: 192.39674371396512
tree[3] featureIndex: 1 featureValue: 9.371498894961741
tree[4] featureIndex: 0 featureValue: 199.07323144154924
Thanks for reporting the problem. It was caused by the features sampling. I have just fixed it. You can check the latest codes in the master branch.
I am testing with the following data [1.0, 2.0, 2.5, 0.2, 2.3, 5.0, 0.75, 0.9, 1.3, 2.4, 1.9, 0.45, 10.3, 20.4, 10.9, 10.45]
and the following default parameters IForest iForest = new IForest().setNumTrees(5) .setMaxSamples(3) .setContamination(0.3) .setBootstrap(false) .setMaxDepth(2) .setSeed(123456L);
The trees are as follows: tree[0] featureIndex: 1 featureValue: 9.417743965315601 tree[1] featureIndex: 0 featureValue: 4.977936221311794 tree[2] featureIndex: 0 featureValue: 4.866908154888555 tree[3] featureIndex: 1 featureValue: 9.391937564448492 tree[4] featureIndex: 0 featureValue: 20.26467549071234
The final tree has a featureValue outside the range of values for featureIndex=0, i.e. between 1.0 -> 10.3.
I tracked the issue to line 553 in iForest.scala, where a shuffling operation happens on the feature indices, this reordering seems to be lost afterwards resulting later in wrong attrIndex. The attrIndex was based on the shuffled data not the original one.