... incorrectly merges list of int[], creates duplicated int values during the merge.
At the least, it impacts performance. Variance computing is completely wrong in the CARTRegressionTrainer.
To Reproduce
Create a unit test with this code:
public void bug() {
IntArrayContainer first = new IntArrayContainer(10);
IntArrayContainer second = new IntArrayContainer(10);
List<int[]> list = List.of(
new int[]{1, 2, 4},
new int[]{3, 5, 6, 7, 8, 9, 10}
);
IntArrayContainer.merge(list, first, second); //<-- buggy method
Assertions.assertEquals(1, first.array[0]);
Assertions.assertEquals(2, first.array[1]); //fails !!! array is filed with [1,1,2,2,3,4,4,5,6,7,8,9,10,0,0]
//expected is [1,2,3,4,5,6,7,8,9,10]
}
Expected behavior
//expected is [1,2,3,4,5,6,7,8,9,10]
System information:
Tribuo Version: [4.3.1]
OS: [Windows]
Java Version: [21]
JDK Vendor: [temurin-21.0.3]
Additional context
The mistake may lie in how the firstBuffer is initialized and updated in each subsequent merge:
public static int[] merge(List<int[]> input, IntArrayContainer firstBuffer, IntArrayContainer secondBuffer) {
if (input.size() > 0) {
firstBuffer.fill(input.get(0));
for (int i = 1; i < input.size(); i++) { // Start from the second array
merge(firstBuffer, input.get(i), secondBuffer);
IntArrayContainer tmp = secondBuffer;
secondBuffer = firstBuffer;
firstBuffer = tmp;
}
return firstBuffer.copy();
} else {
return new int[0];
}
}
Modification made:
Changed the loop start from i = 0 to i = 1. The first array is already used to initialize firstBuffer, so the merging should start from the second array (i = 1).
Describe the bug
This method:
... incorrectly merges list of int[], creates duplicated int values during the merge.
At the least, it impacts performance. Variance computing is completely wrong in the CARTRegressionTrainer.
To Reproduce
Create a unit test with this code:
Expected behavior
System information:
Additional context
The mistake may lie in how the firstBuffer is initialized and updated in each subsequent merge:
Modification made: Changed the loop start from i = 0 to i = 1. The first array is already used to initialize firstBuffer, so the merging should start from the second array (i = 1).