kaz-Anova / StackNet

StackNet is a computational, scalable and analytical Meta modelling framework
MIT License
1.32k stars 344 forks source link

Unknown error #5

Closed snassimr closed 7 years ago

snassimr commented 7 years ago

image

Any another clue where the error and how to fix it ?

kaz-Anova commented 7 years ago

Yes. I believe in the newer version this is fixed. In any case , the file (pred.csv) should have been printed fine. Can you confirm that this is the case? The error is in the metrics' calculation of the test data., so it does not affect the prediction

snassimr commented 7 years ago

If pred.csv intended to be created in same directory I can't see it hasn't created . Regarding newer version - I've download the .jar file a hour ago.

kaz-Anova commented 7 years ago

I will have to look at the problem specifically then. can you send me the previous output - were the models run fine?. is it possible to send me a subset of the file that generated the problem? what was the command you ran?

snassimr commented 7 years ago

Params File : params.txt Command Java -Xmx1048m -jar stacknet.jar train sparse=false has_head=true model=model pred_file=pred.csv train_file=sample_train.csv test_file=sample_test.csv test_target=false params=params.txt verbose=true threads=3 metric=logloss stackdata=false seed=1 folds=2

Sample input files :

StackNet.zip

kpei commented 7 years ago

I am having this issue as well except for me the e.getMessage() returns 2 in the logs. I am running with the quora question dataset and I believe the issue has to do with certain models. For example, I took your vanilla paramsv1.txt and added a few new models such as VanillahnnClassifier, DecisionTreeRegressor and got the error.

silverstone1903 commented 7 years ago

Hi @kaz-Anova ! I've a similar issue. Let me explain with screenshot and codes.

image

Here is the params.txt

RandomForestClassifier bootsrap:false max_tree_size:-1 cut_off_subsample:1.0 feature_subselection:1.0 rounding:6 estimators:100 offset:0.00001 max_depth:6 max_features:0.4 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 threads:1 bags:1 verbose:false

XgboostClassifier booster:gbtree num_round:1000 eta:0.005 max_leaves:0 gamma:1. max_depth:5 min_child_weight:1.0 subsample:0.9 colsample_bytree:0.7 colsample_bylevel:1.0 lambda:1.0 alpha:1.0 seed:1 threads:1 bags:1 verbose:false

NaiveBayesClassifier usescale:True Shrinkage:0.1 seed:1 threads:1 verbose:false

And here is the command;

java -jar StackNet.jar train task=classification sparse=false has_head=true model=model train_file=train_x.csv test_file=test_x.csv test_target=false params=params.txt verbose=true threads=8 metric=accuracy stackdata=false

Btw data is multiclass (3 classes) and imbalanced. If I add a pred_file=preds.csv it also creates predictions but results are meaningless. It must be 0,1,2 but some of the predictions are 0 or 1 and others are 130,220,... etc. So what's wrong? Any idea?

kaz-Anova commented 7 years ago

could you send me the predictions' file and a subset of the training data that replicates the problem?

silverstone1903 commented 7 years ago

@kaz-Anova unfortunately I can't share the data, it's private. I said something wrong! Preds.csv is a prediction for another data (it was a regression) so values are basically true. It's my fault, sorry! So then it doesn't create any prediction file after the error.

molecularswords commented 7 years ago

Not sure if this helps, but I get the same error with the most recent version when using any classifier with any dataset in dense format (I've tested a few of my own data sets as well as the iris set) with any combination of model parameters. I've tested the data with and without headers and with and without a test target value in the first column. The error does not occur with regressors, but currently I can't get any classifiers to work.

goldentom42 commented 7 years ago

Hi @molecularswords, I do not seem to be able to reproduce your issue. I just cloned StackNet and used the iris dataset like so (in Python 3.5)

from sklearn import datasets
from sklearn.model_selection import StratifiedKFold

import numpy as np

# Load iris dataset
dataset = datasets.load_iris()
target = dataset.target
data = dataset.data

# Shuffle dataset
z = np.arange(len(data))
np.random.shuffle(z)
data = data[z]
target = target[z]

n_splits = 3
folds = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=14846789)

trn_dataset = None
trn_dataset = None
for trn_idx, tst_idx in folds.split(data, target):
    trn_iris, trn_target = data[trn_idx], target[trn_idx]
    tst_iris, tst_target = data[tst_idx], target[tst_idx]

    trn_dataset = np.hstack((trn_target.reshape(-1, 1), trn_iris))
    tst_dataset = np.hstack((tst_target.reshape(-1, 1), tst_iris))

    break

# Save dataset in StackNet format
np.savetxt(fname="train_iris.csv", X=trn_dataset, delimiter=',')
np.savetxt(fname="test_iris.csv", X=tst_dataset, delimiter=',')

params.txt looks like :

RandomForestClassifier bootstrap:false max_tree_size:-1 cut_off_subsample:1.0 feature_subselection:1.0 rounding:6 estimators:100 offset:0.00001 max_depth:6 max_features:0.4 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 threads:1 bags:1 verbose:false

XgboostClassifier booster:gbtree num_round:1000 eta:0.005 max_leaves:0 gamma:1. max_depth:5 min_child_weight:1.0 subsample:0.9 colsample_bytree:0.7 colsample_bylevel:1.0 lambda:1.0 alpha:1.0 seed:1 threads:1 bags:1 verbose:false

NaiveBayesClassifier usescale:True Shrinkage:0.1 seed:1 threads:1 verbose:false

which means 3 levels with 1 classifier at each step

And the StackNet call is

java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false

I'm on Linux Ubuntu Xenial 16.04 and java 1.8.0_102.

Can you have a try at this and see if this works for you ? Thanks, goldentom

molecularswords commented 7 years ago

@goldentom42 Your code works, and the reason it wasn't working for me before was completely my fault. After troubleshooting the problem I found that I was unwittingly updating an old params file in a different directory, the path of which is nearly identical to the correct path as was overlooked in my previous troubleshooting efforts before my original post. I apologize for my oversight and thank you for your efforts in helping to resolve my problem.

goldentom42 commented 7 years ago

@molecularswords, no need to apologize ;-) we're all on the same boat! Happy you found the issue and that things work for you now. Cheers, goldentom42

ahbon123 commented 7 years ago

@goldentom42 thank you for sharing iris example. i try with Anaconda under windows but it doesn't work? Can you help me please? thanks. Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: File params.txt failed to import at bufferreader params.txt (系统找不到指定的文件。) at io.input.StackNet_Configuration(input.java:1650) at stacknetrun.runstacknet.main(runstacknet.java:441)

goldentom42 commented 7 years ago

Hi ahbon123, from the stack trace I would assume Stacknet is unable to locate params.txt file. Is the file in the same directory you launch stacknet ?

ahbon123 commented 7 years ago

brilliant! it works when i removed xgboost from params.txt, thanks for both of you @kaz-Anova @goldentom42 , and following are the results. btw, why this happens in case xgboost is added? my java version: java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

without xgboost: C:\Users\zhaod\StackNet\example\iris>java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false parameter name : task value : classification parameter name : sparse value : false parameter name : has_head value : false parameter name : model value : model parameter name : train_file value : train_iris.csv parameter name : test_file value : test_iris.csv parameter name : test_target value : true parameter name : params value : params.txt parameter name : verbose value : true parameter name : threads value : 4 parameter name : metric value : logloss parameter name : stackdata value : false Completed: 4.04 % Completed: 8.08 % Completed: 12.12 % Completed: 16.16 % Completed: 20.20 % Completed: 24.24 % Completed: 28.28 % Completed: 32.32 % Completed: 36.36 % Completed: 40.40 % Completed: 44.44 % Completed: 48.48 % Completed: 52.53 % Completed: 56.57 % Completed: 60.61 % Completed: 64.65 % Completed: 68.69 % Completed: 72.73 % Completed: 76.77 % Completed: 80.81 % Completed: 84.85 % Completed: 88.89 % Completed: 92.93 % Completed: 96.97 % Loaded File: train_iris.csv Total rows in the file: 99 Total columns in the file: 5 Weighted variable : -1 counts: 0 Int Id variable : -1 str id: -1 counts: 0 Target Variables : 1 values : [0] Actual columns number : 4 Number of Skipped rows : 0 Actual Rows (removing the skipped ones) : 99 Loaded dense train data with 99 and columns 4 loaded data in : 0.077000 Level: 1 dimensionality: 3 Starting cross validation Fitting model: 1 logloss : 0.8122057659057356 Done with fold: 1/5 Fitting model: 1 logloss : 0.6068836011804708 Done with fold: 2/5 Fitting model: 1 logloss : 1.0090491625257505 Done with fold: 3/5 Fitting model: 1 logloss : 0.7164653252249316 Done with fold: 4/5 Fitting model: 1 logloss : 0.7112362916009889 Done with fold: 5/5 Average of all folds model 0 : 0.7711680292875756 Level: 1 start output modelling Fitting model : 1 Completed level: 1 out of 2 Level: 2 dimensionality: 3 Starting cross validation Average of all folds model 0 : 0.0 Level: 2 start output modelling Fitting model : 1 Completed level: 2 out of 2 modelling lasted : 6.512000 Completed: 3.92 % Completed: 7.84 % Completed: 11.76 % Completed: 15.69 % Completed: 19.61 % Completed: 23.53 % Completed: 27.45 % Completed: 31.37 % Completed: 35.29 % Completed: 39.22 % Completed: 43.14 % Completed: 47.06 % Completed: 50.98 % Completed: 54.90 % Completed: 58.82 % Completed: 62.75 % Completed: 66.67 % Completed: 70.59 % Completed: 74.51 % Completed: 78.43 % Completed: 82.35 % Completed: 86.27 % Completed: 90.20 % Completed: 94.12 % Completed: 98.04 % Loaded File: test_iris.csv Total rows in the file: 51 Total columns in the file: 5 Weighted variable : -1 counts: 0 Int Id variable : -1 str id: -1 counts: 0 Target Variables : 1 values : [0] Actual columns number : 4 Number of Skipped rows : 0 Actual Rows (removing the skipped ones) : 51 Loaded dense test data with 51 and columns 4 loading test data lasted : 0.059000 Completed: 3.92 % Completed: 7.84 % Completed: 11.76 % Completed: 15.69 % Completed: 19.61 % Completed: 23.53 % Completed: 27.45 % Completed: 31.37 % Completed: 35.29 % Completed: 39.22 % Completed: 43.14 % Completed: 47.06 % Completed: 50.98 % Completed: 54.90 % Completed: 58.82 % Completed: 62.75 % Completed: 66.67 % Completed: 70.59 % Completed: 74.51 % Completed: 78.43 % Completed: 82.35 % Completed: 86.27 % Completed: 90.20 % Completed: 94.12 % Completed: 98.04 % predicting on test data lasted : 0.060000 Test logloss : 0.6459862152836634 The whole StackNet procedure lasted: 6.795000

with xgboost: C:\Users\zhaod\StackNet\example\iris>java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false parameter name : task value : classification parameter name : sparse value : false parameter name : has_head value : false parameter name : model value : model parameter name : train_file value : train_iris.csv parameter name : test_file value : test_iris.csv parameter name : test_target value : true parameter name : params value : params.txt parameter name : verbose value : true parameter name : threads value : 4 parameter name : metric value : logloss parameter name : stackdata value : false Completed: 4.04 % Completed: 8.08 % Completed: 12.12 % Completed: 16.16 % Completed: 20.20 % Completed: 24.24 % Completed: 28.28 % Completed: 32.32 % Completed: 36.36 % Completed: 40.40 % Completed: 44.44 % Completed: 48.48 % Completed: 52.53 % Completed: 56.57 % Completed: 60.61 % Completed: 64.65 % Completed: 68.69 % Completed: 72.73 % Completed: 76.77 % Completed: 80.81 % Completed: 84.85 % Completed: 88.89 % Completed: 92.93 % Completed: 96.97 % Loaded File: train_iris.csv Total rows in the file: 99 Total columns in the file: 5 Weighted variable : -1 counts: 0 Int Id variable : -1 str id: -1 counts: 0 Target Variables : 1 values : [0] Actual columns number : 4 Number of Skipped rows : 0 Actual Rows (removing the skipped ones) : 99 Loaded dense train data with 99 and columns 4 loaded data in : 0.077000 Level: 1 dimensionality: 3 Starting cross validation Fitting model: 1 logloss : 0.8122057659057356 Done with fold: 1/5 Fitting model: 1 logloss : 0.6068836011804708 Done with fold: 2/5 Fitting model: 1 logloss : 1.0090491625257505 Done with fold: 3/5 Fitting model: 1 logloss : 0.7164653252249316 Done with fold: 4/5 Fitting model: 1 logloss : 0.7112362916009889 Done with fold: 5/5 Average of all folds model 0 : 0.7711680292875756 Level: 1 start output modelling Fitting model : 1 Completed level: 1 out of 3 Level: 2 dimensionality: 3 Starting cross validation Fitting model: 1 Exception in thread "Thread-8976" java.lang.IllegalStateException: failed to create Xgboost subprocess with config name C:\Users\zhaod\StackNet\example\iris\models\4gianleib1ibcd3cfqevou6vja.conf at ml.xgboost.XgboostClassifier.create_xg_suprocess(XgboostClassifier.java:359) at ml.xgboost.XgboostClassifier.fit(XgboostClassifier.java:1488) at ml.xgboost.XgboostClassifier.run(XgboostClassifier.java:480) at java.lang.Thread.run(Unknown Source) Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: Tree is not fitted at ml.Bagging.scoringhelpercatbagv2.<init>(scoringhelpercatbagv2.java:89) at ml.Bagging.BaggingClassifier.predict_proba(BaggingClassifier.java:398) at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2950) at stacknetrun.runstacknet.main(runstacknet.java:471) ... 5 more

kaz-Anova commented 7 years ago

@ahbon123

Please see instructions on how to install xgboost properly inside StackNet here:

and the for lightgbm, it should be similar:

the discussion here :

ahbon123 commented 7 years ago

i installed xgboost to lib/win using git clone xgboost, it still doesn't work, i think maybe i should follow the steps here

kaz-Anova commented 7 years ago

what does the executable print if you try to execute it from the command line ? so if you just cd to lib/win/xg and you just type xgboost.exe inside, what does it say?

kaz-Anova commented 7 years ago

Also please confirm you are using the lib/ folder from the repository, it is the first time seeing not working in windows.

https://github.com/kaz-Anova/StackNet/tree/master/lib/win/xg

ahbon123 commented 7 years ago

C:\Users\zhaod\StackNet\lib\win\xg>xgboost.exe Usage: <config>

kaz-Anova commented 7 years ago

Then xgboost should run fine. Please do:

  1. run java -jar StackNet.jar in the same folder where the lib/ folder is. Both the .jar and the lib/ folder need to be in the same place.
  2. please add verbose:true to the xgboost model inside the params file and lets see what it prints when you run it.
ahbon123 commented 7 years ago

1. run java -jar StackNet.jar in lib, result: `C:\Users\User\StackNet\lib>java -jar StackNet.jar 'train' or 'predict' : to train or predict 'sparse' : true if the data to be imported are in sparse format (libsvm) or dense (false) 'task' : could be either 'regression' or 'classification'. 'has_head' : true if train_file and test_file have headers else false 'model' : name of the output model file. 'output_name' : prefix of the models to be printed per iteration. this is to allows the meta features of each iterations to be printed. defaults to nothing. 'indices_name' : suffix for the names of kfold indices to be printed as .csvs . It will print as many files as the selected kfold with names [indices_name][fold_number].csv . It will have the format of 'index,[0 if training else 1]' 'pred_file' : name of the output prediction file. 'data_prefix' : prefix to be used when the user supplies own pairs of [X_train,Xcv] datasets for each fold as well as an X file for the whole training data. Each train/valid pair is identified by prefix'train'[fold_index_starting_fromzero]'.txt'/prefix'cv'[fold_index_starting_fromzero]'.txt' and prefix'train.txt' for the final set. For example if prefix='mystack' and folds=2 then stacknet is expecting 2 pairs of train/cv files. e.g [[mystack_train0.txt,mystack_cv0.txt],[mystack_train1.txt,mystack_cv1.txt]]. It also expects a [mystack_train.txt] for the final train set 'train_file' : name of the training file. 'test_file' : name of the test file. 'test_target' : true if the test file has a target variable in the beginning (left) else false (only predictors in the file). 'params' : parameter file where each line is a model. empty lines correspond to the creation of new levels 'verbose' : true if we need StackNet to output its progress else false 'threads' : number of models to run in parallel. This is independent of any extra threads allocated from the selected algorithms. e.g. it is possible to run 4 models in parallel where one is a randomforest that runs on 10 threads (it selected). 'metric' : Metric to output in cross validation for each model-neuron. can be logloss, accuracy or auc (for binary only) for classification and rmse ,rsquared or mae for regerssion .defaults to 'logloss' for classification and 'rmse' for regression 'stackdata' :true for restacking else false 'seed' : integer for randomised procedures 'folds' : number of folds for re-usable kfold 'bins' : A parameter that allows classifiers to be used in regression problems. It first bins (digitises) the target variable and then runs classifiers on the transformed variable. Defaults to 2

example of parameter file :

LogisticRegression C:1 Type:Liblinear maxim_Iteration:100 scale:true verbose:false RandomForestClassifier bootsrap:false estimators:100 threads:5 logit.offset:0.00001 verbose:false cut_off_subsample:1.0 feature_subselection:1.0 gamma:0.00001 max_depth:8 max_features:0.25 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 GradientBoostingForestClassifier estimators:100 threads: offset:0.00001 verbose:false trees:1 rounding:2 shrinkage:0.05 cut_off_subsample:1.0 feature_subselection:0.8 gamma:0.00001 max_depth:8 max_features:1.0 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:RMSE row_subsample:0.9 seed:1 Vanilla2hnnclassifier UseConstant:true usescale:true seed:1 Type:SGD maxim_Iteration:50 C:0.000001 learn_rate:0.009 smooth:0.02 h1:30 h2:20 connection_nonlinearity:Relu init_values:0.02 LSVC Type:Liblinear threads:1 C:1.0 maxim_Iteration:100 seed:1 LibFmClassifier lfeatures:3 init_values:0.035 smooth:0.05 learn_rate:0.1 threads:1 C:0.00001 maxim_Iteration:15 seed:1 NaiveBayesClassifier usescale:true threads:1 Shrinkage:0.1 seed:1 verbose:false

RandomForestClassifier estimators=1000 rounding:3 threads:4 max_depth:6 max_features:0.6 min_leaf:2.0 Objective:ENTROPY gamma:0.000001 row_subsample:1.0 verbose:false copy=false`

2. i changed verbose: false to verbose: true, same results: Exception in thread "Thread-8976" java.lang.IllegalStateException: failed to create Xgboost subprocess with config name C:\Users\User\StackNet\example\iris\models\eeo8v2jt3mrmbh03egndo0b6km.conf at ml.xgboost.XgboostClassifier.create_xg_suprocess(XgboostClassifier.java:359) at ml.xgboost.XgboostClassifier.fit(XgboostClassifier.java:1488) at ml.xgboost.XgboostClassifier.run(XgboostClassifier.java:480) at java.lang.Thread.run(Unknown Source) Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: Tree is not fitted at ml.Bagging.scoringhelpercatbagv2.<init>(scoringhelpercatbagv2.java:89) at ml.Bagging.BaggingClassifier.predict_proba(BaggingClassifier.java:398) at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2950) at stacknetrun.runstacknet.main(runstacknet.java:471) ... 5 more

ahbon123 commented 7 years ago

i think bugs occurred because of installation of xgboost.

kaz-Anova commented 7 years ago

No, I was not clear Stacknet.jar needs to be in the same folder as lib/ , not inside lib/ .

(and the lib folder needs to be exactly as you downloaded it from the git - no other changes or additions)

Please see this.

iris_files

this is all you need to run it. Put all files in the same directory, ensure .jar and lib/ folders are present.

ahbon123 commented 7 years ago

it works now, thanks a lot! @kaz-Anova

snassimr commented 7 years ago

I've got working it on iris data , also . In further several days I am going to test it in real Kaggle Competition and so we can close this issue , I think