Closed ajing closed 7 years ago
Hi ajing,
Looks like you're trying to predict what's in your next shopping cart ;-) But it may not be the right time to make a joke...
I assume the cause of the exception is :
Caused by: java.lang.NegativeArraySizeException
Somehow StackNet and more specifically fsmatrix initialization fails at line 85
this.data=new double [rows*columns];
So StackNet ends up with negative values for either rows or columns eventhough it successfully reads your files...
Just for the record is there anything wrong in paramsv1.txt like negative values ? A last question what version of stacknet do you use? The stack trace does not seem in line with the latest master branch. Goldentom.
Exactly, Goldentom. This is the last few hour of the Instacart competition. I just quickly throw a model last night and want to see what can happen. So, I am not very cautious about selecting models. I just used the Quora example (because also binary classification...)
LogisticRegression Type:Liblinear C:0.8 threads:1 usescale:True maxim_Iteration:100 seed:1 verbose:false
RandomForestClassifier estimators:100 threads:1 rounding:3 cut_off_subsample:0.15 max_depth:7 max_features:0.7 min_leaf:3.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 verbose:false
LogisticRegression Type:SGD C:0.00001 threads:1 learn_rate:0.1 usescale:True maxim_Iteration:20 seed:1 verbose:false
LSVC Type:Liblinear threads:1 usescale:True C:3.0 maxim_Iteration:100 seed:1 verbose:false copy:false
LSVC Type:SGD C:0.00001 threads:1 learn_rate:0.1 usescale:True maxim_Iteration:20 seed:1 verbose:false copy:false
RandomForestClassifier estimators:100 threads:1 rounding:3 cut_off_subsample:1.0 max_depth:5 max_features:0.7 min_leaf:3.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 verbose:false
softmaxnnclassifier usescale:True seed:1 Type:SGD maxim_Iteration:30 C:0.00001 shuffle:false learn_rate:0.001 smooth:0.1 h1:20 h2:30 connection_nonlinearity:Relu init_values:0.01 verbose:false copy:false
LibFmClassifier maxim_Iteration:100 C:0.000001 C2:0.02 lfeatures:3 seed:1 usescale:True init_values:0.001 learn_rate:0.04 smooth:0.0001 threads:1 verbose:false
GradientBoostingForestClassifier rounding:3 estimators:1000 shrinkage:0.1 threads:1 offset:0.00001 max_depth:8 max_features:0.4 min_leaf:4.0 min_split:8.0 Objective:RMSE row_subsample:0.7 seed:1 verbose:false
LibFmRegressor maxim_Iteration:100 C:0.000001 C2:0.02 lfeatures:3 seed:1 usescale:True init_values:0.001 learn_rate:0.04 smooth:0.0001 threads:1 verbose:false
GradientBoostingForestRegressor rounding:3 estimators:100 shrinkage:0.2 threads:1 cut_off_subsample:0.8 offset:0.00001 max_depth:9 max_features:0.4 min_leaf:4.0 min_split:8.0 Objective:RMSE row_subsample:0.7 seed:1 verbose:false
RandomForestClassifier estimators:300 threads:3 rounding:3.0 max_depth:12 max_features:0.4 min_leaf:3.0 min_split:5.0 Objective:ENTROPY row_subsample:0.9 seed:1 verbose:false
I updated the package and here is the new error message:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NegativeArraySizeException
at matrix.fsmatrix.<init>(fsmatrix.java:85)
at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2871)
at stacknetrun.runstacknet.main(runstacknet.java:471)
... 5 more
Thanks ajing, I was not expecting you to update the package.
Looking a bit more at the code (I found you had version of 26/06/2017). Line 2749 (now 2871) of StackNetClassifier is
int temp_class=estimate_classes(level_grid, this.n_classes, level==(parameters.length-1));
column_counts[level] = temp_class;
if (this.verbose){
System.out.println(" Level: " + (level+1) + " dimensionality: " + temp_class);
System.out.println(" Starting cross validation ");
}
if (level<parameters.length -1){
trainstacker=new fsmatrix(target.length, temp_class); <- This is line 2871
The last line is the call to fsmatrix that throws the exception. And with the logs I can see that rows = 8474661 and temp_class = 893 rows * temp_class = 7 567 872 273 and this is big...
if the double vector allocation expects an int (-2 147 483 648 et +2 147 483 647):
this.data=new double [rows*columns]
Then we're out of bound!
I'm not a java expert so we may need to wait for @kaz-Anova to check this out.
@goldentom42 is right about the negative exception happening due to the size . However my main problem is with temp_class = 893
where StackNet thinks paramsv1.txt
contains 893 models in the first layer! @ajing Could you please send a few lines of the train file (nz_train_slim.csv'
) that replicate the problem and the paramsv1.txt
please?
Please send to kazanovassoftware@gmail.com
@kaz-Anova, sure I was surprised by the 893 as well but was focusing on the exception ;-) In params there are 2 regressors and 9 classifiers, which means the program found 99 classes in the input file (9 * 99 + 2 = 893) @ajing, anything suspicious in the first column of the input file?
@kaz-Anova @goldentom42 You guys are right. I was using a wrong column. Working on fixing it... Will there be an easy way to estimate the training time?
@ajing ..realistically speaking...it wont finish today :( I am afraid (e.g. you wont have enough time before Instacart finishes...)
@kaz-Anova That's my guess also.. Last time, I ran a smaller one on another data set, which was taking about three days. But, I still want to practice more on StackNet. You current submission achieves pretty a high score. Is that solely based on StackNet?
@ajing . You can see my approach here: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38100
Stacking was not that important in this comp - but I would not have finished top 10 (not even top 20) without it.
@kaz-Anova Congratulations! I am really amazed you have tried so many ideas in such a short period of time. You must have something to make your work time efficient.
After fixing the number of class problem, now I have an out of memory error. But, I guess it can be solved by adding more memory..
Loaded File: /home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv
Total rows in the file: 8474661
Total columns in the file: 78
Weighted variable : -1 counts: 0
Int Id variable : -1 str id: -1 counts: 0
Target Variables : 1 values : [0]
Actual columns number : 77
Number of Skipped rows : 0
Actual Rows (removing the skipped ones) : 8474661
Loaded dense train data with 8474661 and columns 77
loaded data in : 127.731000
Level: 1 dimensionality: 11
Starting cross validation
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.OutOfMemoryError: Java heap space
at matrix.fsmatrix.makerowsubset(fsmatrix.java:103)
at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2900)
at stacknetrun.runstacknet.main(runstacknet.java:471)
... 5 more
Hi,
I encountered an error for running StackNet. Here is the command:
Here is the error message. What does InvocationTargetException error here imply?