Closed CharlieAprog closed 2 weeks ago
It is a default value, not hard-coded value. You can use java option -Dsmile.arm.items=XYZ
to pass your number of items. Note that the input is a stream so that we can scan the data twice. Otherwise, we can scan the data in the first pass to find out the number of unique items. On the other hand, the user often has a good idea of their itemset. So we rely this property to run the algorithm in single pass, which is also faster.
When running the FP Growth algorithm and defining the FP Tree thre is a maximum limit of unique items currently set. This means that no frequent pattern mining can be completed if the number of unique items is high enough I found this when mining a very sparse dataset with the following error
Exception has occurred: java.lang.ArrayIndexOutOfBoundsException "java.lang.ArrayIndexOutOfBoundsException: Index 230037 out of bounds for length 65536" in line 302, FPTree.class
Expected behavior I would expect there to be NO upper limit to the number of unique items possible
Actual behavior Currently the upper limit of unique items is 65536
Location FPTree.class
on line 298 there is the line
int n = Integer.parseInt(System.getProperty("smile.arm.items", "65536"));
this should be the total number of unique items in the dataset instead of a fixed value of 65536
Additional context Currently Version: Maven, smile-core: 3.1.1