kaz-Anova / StackNet

StackNet is a computational, scalable and analytical Meta modelling framework
MIT License
1.32k stars 344 forks source link

InvocationTargetException error #22

Closed ajing closed 7 years ago

ajing commented 7 years ago

Hi,

I encountered an error for running StackNet. Here is the command:

java -Xmx12144m -jar StackNet.jar train train_file='/home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv' test_file='/home/jlu/
Experiments/Examples/Instacart/imba/data/all_data_test_V1.csv' has_head=true params='/home/jlu/Experiments/Examples/Instacart/imba/paramsv1.txt' sparse=false pred_file='/home/jlu/Experiments/Exam
ples/Instacart/imba/data/stacknet_pred_V1.csv' test_target=false verbose=true Threads=10 folds=5 seed=1 metric=auc output_name=restack_instacart folds=10 seed=1 task=classification

Here is the error message. What does InvocationTargetException error here imply?

parameter name : train_file value :  /home/jlu/experiments/examples/instacart/imba/data/nz_train_slim.csv
parameter name : test_file value :  /home/jlu/experiments/examples/instacart/imba/data/all_data_test_v1.csv
parameter name : has_head value :  true
parameter name : params value :  /home/jlu/experiments/examples/instacart/imba/paramsv1.txt
parameter name : sparse value :  false
parameter name : pred_file value :  /home/jlu/experiments/examples/instacart/imba/data/stacknet_pred_v1.csv
parameter name : test_target value :  false
parameter name : verbose value :  true
parameter name : threads value :  10
parameter name : folds value :  5
parameter name : seed value :  1
parameter name : metric value :  auc
parameter name : output_name value :  restack_instacart
parameter name : folds value :  10
parameter name : seed value :  1
parameter name : task value :  classification
 Completed: 5.00 %
 Completed: 10.00 %
 Completed: 15.00 %
 Completed: 20.00 %
 Completed: 25.00 %
 Completed: 30.00 %
 Completed: 35.00 %
 Completed: 40.00 %
 Completed: 45.00 %
 Completed: 50.00 %
 Completed: 55.00 %
 Completed: 60.00 %
 Completed: 65.00 %
 Completed: 70.00 %
 Completed: 75.00 %
 Completed: 80.00 %
 Completed: 85.00 %
 Completed: 90.00 %
 Completed: 95.00 %
 Completed: 100.00 %
 Loaded File: /home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv
 Total rows in the file: 8474661
 Total columns in the file: 78
 Weighted variable : -1 counts: 0
 Int Id variable : -1 str id: -1 counts: 0
 Target Variables  : 1 values : [0]
 Actual columns number  : 77
 Number of Skipped rows   : 0
 Actual Rows (removing the skipped ones)  : 8474661
Loaded dense train data with 8474661 and columns 77
 loaded data in : 125.971000
 Level: 1 dimensionality: 893
 Starting cross validation
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NegativeArraySizeException
        at matrix.fsmatrix.<init>(fsmatrix.java:85)
        at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2749)
        at stacknetrun.runstacknet.main(runstacknet.java:471)
        ... 5 more
goldentom42 commented 7 years ago

Hi ajing,

Looks like you're trying to predict what's in your next shopping cart ;-) But it may not be the right time to make a joke...

I assume the cause of the exception is :

Caused by: java.lang.NegativeArraySizeException

Somehow StackNet and more specifically fsmatrix initialization fails at line 85

this.data=new double [rows*columns];

So StackNet ends up with negative values for either rows or columns eventhough it successfully reads your files...

Just for the record is there anything wrong in paramsv1.txt like negative values ? A last question what version of stacknet do you use? The stack trace does not seem in line with the latest master branch. Goldentom.

ajing commented 7 years ago

Exactly, Goldentom. This is the last few hour of the Instacart competition. I just quickly throw a model last night and want to see what can happen. So, I am not very cautious about selecting models. I just used the Quora example (because also binary classification...)

LogisticRegression Type:Liblinear C:0.8 threads:1 usescale:True maxim_Iteration:100 seed:1 verbose:false
RandomForestClassifier estimators:100 threads:1 rounding:3 cut_off_subsample:0.15 max_depth:7 max_features:0.7 min_leaf:3.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 verbose:false
LogisticRegression Type:SGD C:0.00001 threads:1 learn_rate:0.1 usescale:True maxim_Iteration:20 seed:1 verbose:false
LSVC Type:Liblinear threads:1 usescale:True C:3.0 maxim_Iteration:100 seed:1 verbose:false copy:false
LSVC Type:SGD C:0.00001 threads:1 learn_rate:0.1 usescale:True maxim_Iteration:20 seed:1 verbose:false copy:false
RandomForestClassifier estimators:100 threads:1 rounding:3 cut_off_subsample:1.0 max_depth:5 max_features:0.7 min_leaf:3.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 verbose:false
softmaxnnclassifier usescale:True seed:1 Type:SGD maxim_Iteration:30 C:0.00001 shuffle:false learn_rate:0.001 smooth:0.1 h1:20 h2:30 connection_nonlinearity:Relu init_values:0.01 verbose:false copy:false
LibFmClassifier maxim_Iteration:100 C:0.000001 C2:0.02 lfeatures:3 seed:1 usescale:True init_values:0.001 learn_rate:0.04 smooth:0.0001 threads:1 verbose:false
GradientBoostingForestClassifier rounding:3 estimators:1000 shrinkage:0.1 threads:1 offset:0.00001 max_depth:8 max_features:0.4 min_leaf:4.0 min_split:8.0 Objective:RMSE row_subsample:0.7 seed:1 verbose:false
LibFmRegressor maxim_Iteration:100 C:0.000001 C2:0.02 lfeatures:3 seed:1 usescale:True init_values:0.001 learn_rate:0.04 smooth:0.0001 threads:1 verbose:false
GradientBoostingForestRegressor rounding:3 estimators:100 shrinkage:0.2 threads:1 cut_off_subsample:0.8 offset:0.00001 max_depth:9 max_features:0.4 min_leaf:4.0 min_split:8.0 Objective:RMSE row_subsample:0.7 seed:1 verbose:false

RandomForestClassifier estimators:300 threads:3 rounding:3.0 max_depth:12 max_features:0.4 min_leaf:3.0 min_split:5.0 Objective:ENTROPY row_subsample:0.9 seed:1 verbose:false

I updated the package and here is the new error message:

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NegativeArraySizeException
        at matrix.fsmatrix.<init>(fsmatrix.java:85)
        at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2871)
        at stacknetrun.runstacknet.main(runstacknet.java:471)
        ... 5 more
goldentom42 commented 7 years ago

Thanks ajing, I was not expecting you to update the package.

Looking a bit more at the code (I found you had version of 26/06/2017). Line 2749 (now 2871) of StackNetClassifier is

int temp_class=estimate_classes(level_grid,  this.n_classes, level==(parameters.length-1));
column_counts[level] = temp_class;
if (this.verbose){
    System.out.println(" Level: " +  (level+1) + " dimensionality: " + temp_class);
    System.out.println(" Starting cross validation ");
}
if (level<parameters.length -1){
    trainstacker=new fsmatrix(target.length, temp_class); <- This is line 2871

The last line is the call to fsmatrix that throws the exception. And with the logs I can see that rows = 8474661 and temp_class = 893 rows * temp_class = 7 567 872 273 and this is big...

if the double vector allocation expects an int (-2 147 483 648 et +2 147 483 647):

this.data=new double [rows*columns]

Then we're out of bound!

I'm not a java expert so we may need to wait for @kaz-Anova to check this out.

kaz-Anova commented 7 years ago

@goldentom42 is right about the negative exception happening due to the size . However my main problem is with temp_class = 893 where StackNet thinks paramsv1.txt contains 893 models in the first layer! @ajing Could you please send a few lines of the train file (nz_train_slim.csv') that replicate the problem and the paramsv1.txt please?

kaz-Anova commented 7 years ago

Please send to kazanovassoftware@gmail.com

goldentom42 commented 7 years ago

@kaz-Anova, sure I was surprised by the 893 as well but was focusing on the exception ;-) In params there are 2 regressors and 9 classifiers, which means the program found 99 classes in the input file (9 * 99 + 2 = 893) @ajing, anything suspicious in the first column of the input file?

ajing commented 7 years ago

@kaz-Anova @goldentom42 You guys are right. I was using a wrong column. Working on fixing it... Will there be an easy way to estimate the training time?

kaz-Anova commented 7 years ago

@ajing ..realistically speaking...it wont finish today :( I am afraid (e.g. you wont have enough time before Instacart finishes...)

ajing commented 7 years ago

@kaz-Anova That's my guess also.. Last time, I ran a smaller one on another data set, which was taking about three days. But, I still want to practice more on StackNet. You current submission achieves pretty a high score. Is that solely based on StackNet?

kaz-Anova commented 7 years ago

@ajing . You can see my approach here: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38100

Stacking was not that important in this comp - but I would not have finished top 10 (not even top 20) without it.

ajing commented 7 years ago

@kaz-Anova Congratulations! I am really amazed you have tried so many ideas in such a short period of time. You must have something to make your work time efficient.

After fixing the number of class problem, now I have an out of memory error. But, I guess it can be solved by adding more memory..

 Loaded File: /home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv
 Total rows in the file: 8474661
 Total columns in the file: 78
 Weighted variable : -1 counts: 0
 Int Id variable : -1 str id: -1 counts: 0
 Target Variables  : 1 values : [0]
 Actual columns number  : 77
 Number of Skipped rows   : 0
 Actual Rows (removing the skipped ones)  : 8474661
Loaded dense train data with 8474661 and columns 77
 loaded data in : 127.731000
 Level: 1 dimensionality: 11
 Starting cross validation
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at matrix.fsmatrix.makerowsubset(fsmatrix.java:103)
        at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2900)
        at stacknetrun.runstacknet.main(runstacknet.java:471)
        ... 5 more