Closed snassimr closed 7 years ago
Yes. I believe in the newer version this is fixed. In any case , the file (pred.csv) should have been printed fine. Can you confirm that this is the case? The error is in the metrics' calculation of the test data., so it does not affect the prediction
If pred.csv intended to be created in same directory I can't see it hasn't created . Regarding newer version - I've download the .jar file a hour ago.
I will have to look at the problem specifically then. can you send me the previous output - were the models run fine?. is it possible to send me a subset of the file that generated the problem? what was the command you ran?
Params File : params.txt Command Java -Xmx1048m -jar stacknet.jar train sparse=false has_head=true model=model pred_file=pred.csv train_file=sample_train.csv test_file=sample_test.csv test_target=false params=params.txt verbose=true threads=3 metric=logloss stackdata=false seed=1 folds=2
Sample input files :
I am having this issue as well except for me the e.getMessage() returns 2 in the logs. I am running with the quora question dataset and I believe the issue has to do with certain models. For example, I took your vanilla paramsv1.txt and added a few new models such as VanillahnnClassifier, DecisionTreeRegressor and got the error.
Hi @kaz-Anova ! I've a similar issue. Let me explain with screenshot and codes.
Here is the params.txt
RandomForestClassifier bootsrap:false max_tree_size:-1 cut_off_subsample:1.0 feature_subselection:1.0 rounding:6 estimators:100 offset:0.00001 max_depth:6 max_features:0.4 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 threads:1 bags:1 verbose:false
XgboostClassifier booster:gbtree num_round:1000 eta:0.005 max_leaves:0 gamma:1. max_depth:5 min_child_weight:1.0 subsample:0.9 colsample_bytree:0.7 colsample_bylevel:1.0 lambda:1.0 alpha:1.0 seed:1 threads:1 bags:1 verbose:false
NaiveBayesClassifier usescale:True Shrinkage:0.1 seed:1 threads:1 verbose:false
And here is the command;
java -jar StackNet.jar train task=classification sparse=false has_head=true model=model train_file=train_x.csv test_file=test_x.csv test_target=false params=params.txt verbose=true threads=8 metric=accuracy stackdata=false
Btw data is multiclass (3 classes) and imbalanced. If I add a So what's wrong? Any idea?pred_file=preds.csv
it also creates predictions but results are meaningless. It must be 0,1,2 but some of the predictions are 0 or 1 and others are 130,220,... etc.
could you send me the predictions' file and a subset of the training data that replicates the problem?
@kaz-Anova unfortunately I can't share the data, it's private. I said something wrong! Preds.csv is a prediction for another data (it was a regression) so values are basically true. It's my fault, sorry! So then it doesn't create any prediction file after the error.
Not sure if this helps, but I get the same error with the most recent version when using any classifier with any dataset in dense format (I've tested a few of my own data sets as well as the iris set) with any combination of model parameters. I've tested the data with and without headers and with and without a test target value in the first column. The error does not occur with regressors, but currently I can't get any classifiers to work.
Hi @molecularswords, I do not seem to be able to reproduce your issue. I just cloned StackNet and used the iris dataset like so (in Python 3.5)
from sklearn import datasets
from sklearn.model_selection import StratifiedKFold
import numpy as np
# Load iris dataset
dataset = datasets.load_iris()
target = dataset.target
data = dataset.data
# Shuffle dataset
z = np.arange(len(data))
np.random.shuffle(z)
data = data[z]
target = target[z]
n_splits = 3
folds = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=14846789)
trn_dataset = None
trn_dataset = None
for trn_idx, tst_idx in folds.split(data, target):
trn_iris, trn_target = data[trn_idx], target[trn_idx]
tst_iris, tst_target = data[tst_idx], target[tst_idx]
trn_dataset = np.hstack((trn_target.reshape(-1, 1), trn_iris))
tst_dataset = np.hstack((tst_target.reshape(-1, 1), tst_iris))
break
# Save dataset in StackNet format
np.savetxt(fname="train_iris.csv", X=trn_dataset, delimiter=',')
np.savetxt(fname="test_iris.csv", X=tst_dataset, delimiter=',')
params.txt looks like :
RandomForestClassifier bootstrap:false max_tree_size:-1 cut_off_subsample:1.0 feature_subselection:1.0 rounding:6 estimators:100 offset:0.00001 max_depth:6 max_features:0.4 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 threads:1 bags:1 verbose:false
XgboostClassifier booster:gbtree num_round:1000 eta:0.005 max_leaves:0 gamma:1. max_depth:5 min_child_weight:1.0 subsample:0.9 colsample_bytree:0.7 colsample_bylevel:1.0 lambda:1.0 alpha:1.0 seed:1 threads:1 bags:1 verbose:false
NaiveBayesClassifier usescale:True Shrinkage:0.1 seed:1 threads:1 verbose:false
which means 3 levels with 1 classifier at each step
And the StackNet call is
java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false
I'm on Linux Ubuntu Xenial 16.04 and java 1.8.0_102.
Can you have a try at this and see if this works for you ? Thanks, goldentom
@goldentom42 Your code works, and the reason it wasn't working for me before was completely my fault. After troubleshooting the problem I found that I was unwittingly updating an old params file in a different directory, the path of which is nearly identical to the correct path as was overlooked in my previous troubleshooting efforts before my original post. I apologize for my oversight and thank you for your efforts in helping to resolve my problem.
@molecularswords, no need to apologize ;-) we're all on the same boat! Happy you found the issue and that things work for you now. Cheers, goldentom42
@goldentom42 thank you for sharing iris example. i try with Anaconda under windows but it doesn't work? Can you help me please? thanks. Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: File params.txt failed to import at bufferreader params.txt (系统找不到指定的文件。) at io.input.StackNet_Configuration(input.java:1650) at stacknetrun.runstacknet.main(runstacknet.java:441)
Hi ahbon123, from the stack trace I would assume Stacknet is unable to locate params.txt file. Is the file in the same directory you launch stacknet ?
brilliant! it works when i removed xgboost from params.txt, thanks for both of you @kaz-Anova @goldentom42 , and following are the results. btw, why this happens in case xgboost is added? my java version:
java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
without xgboost:
C:\Users\zhaod\StackNet\example\iris>java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false parameter name : task value : classification parameter name : sparse value : false parameter name : has_head value : false parameter name : model value : model parameter name : train_file value : train_iris.csv parameter name : test_file value : test_iris.csv parameter name : test_target value : true parameter name : params value : params.txt parameter name : verbose value : true parameter name : threads value : 4 parameter name : metric value : logloss parameter name : stackdata value : false Completed: 4.04 % Completed: 8.08 % Completed: 12.12 % Completed: 16.16 % Completed: 20.20 % Completed: 24.24 % Completed: 28.28 % Completed: 32.32 % Completed: 36.36 % Completed: 40.40 % Completed: 44.44 % Completed: 48.48 % Completed: 52.53 % Completed: 56.57 % Completed: 60.61 % Completed: 64.65 % Completed: 68.69 % Completed: 72.73 % Completed: 76.77 % Completed: 80.81 % Completed: 84.85 % Completed: 88.89 % Completed: 92.93 % Completed: 96.97 % Loaded File: train_iris.csv Total rows in the file: 99 Total columns in the file: 5 Weighted variable : -1 counts: 0 Int Id variable : -1 str id: -1 counts: 0 Target Variables : 1 values : [0] Actual columns number : 4 Number of Skipped rows : 0 Actual Rows (removing the skipped ones) : 99 Loaded dense train data with 99 and columns 4 loaded data in : 0.077000 Level: 1 dimensionality: 3 Starting cross validation Fitting model: 1 logloss : 0.8122057659057356 Done with fold: 1/5 Fitting model: 1 logloss : 0.6068836011804708 Done with fold: 2/5 Fitting model: 1 logloss : 1.0090491625257505 Done with fold: 3/5 Fitting model: 1 logloss : 0.7164653252249316 Done with fold: 4/5 Fitting model: 1 logloss : 0.7112362916009889 Done with fold: 5/5 Average of all folds model 0 : 0.7711680292875756 Level: 1 start output modelling Fitting model : 1 Completed level: 1 out of 2 Level: 2 dimensionality: 3 Starting cross validation Average of all folds model 0 : 0.0 Level: 2 start output modelling Fitting model : 1 Completed level: 2 out of 2 modelling lasted : 6.512000 Completed: 3.92 % Completed: 7.84 % Completed: 11.76 % Completed: 15.69 % Completed: 19.61 % Completed: 23.53 % Completed: 27.45 % Completed: 31.37 % Completed: 35.29 % Completed: 39.22 % Completed: 43.14 % Completed: 47.06 % Completed: 50.98 % Completed: 54.90 % Completed: 58.82 % Completed: 62.75 % Completed: 66.67 % Completed: 70.59 % Completed: 74.51 % Completed: 78.43 % Completed: 82.35 % Completed: 86.27 % Completed: 90.20 % Completed: 94.12 % Completed: 98.04 % Loaded File: test_iris.csv Total rows in the file: 51 Total columns in the file: 5 Weighted variable : -1 counts: 0 Int Id variable : -1 str id: -1 counts: 0 Target Variables : 1 values : [0] Actual columns number : 4 Number of Skipped rows : 0 Actual Rows (removing the skipped ones) : 51 Loaded dense test data with 51 and columns 4 loading test data lasted : 0.059000 Completed: 3.92 % Completed: 7.84 % Completed: 11.76 % Completed: 15.69 % Completed: 19.61 % Completed: 23.53 % Completed: 27.45 % Completed: 31.37 % Completed: 35.29 % Completed: 39.22 % Completed: 43.14 % Completed: 47.06 % Completed: 50.98 % Completed: 54.90 % Completed: 58.82 % Completed: 62.75 % Completed: 66.67 % Completed: 70.59 % Completed: 74.51 % Completed: 78.43 % Completed: 82.35 % Completed: 86.27 % Completed: 90.20 % Completed: 94.12 % Completed: 98.04 % predicting on test data lasted : 0.060000 Test logloss : 0.6459862152836634 The whole StackNet procedure lasted: 6.795000
with xgboost:
C:\Users\zhaod\StackNet\example\iris>java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false parameter name : task value : classification parameter name : sparse value : false parameter name : has_head value : false parameter name : model value : model parameter name : train_file value : train_iris.csv parameter name : test_file value : test_iris.csv parameter name : test_target value : true parameter name : params value : params.txt parameter name : verbose value : true parameter name : threads value : 4 parameter name : metric value : logloss parameter name : stackdata value : false Completed: 4.04 % Completed: 8.08 % Completed: 12.12 % Completed: 16.16 % Completed: 20.20 % Completed: 24.24 % Completed: 28.28 % Completed: 32.32 % Completed: 36.36 % Completed: 40.40 % Completed: 44.44 % Completed: 48.48 % Completed: 52.53 % Completed: 56.57 % Completed: 60.61 % Completed: 64.65 % Completed: 68.69 % Completed: 72.73 % Completed: 76.77 % Completed: 80.81 % Completed: 84.85 % Completed: 88.89 % Completed: 92.93 % Completed: 96.97 % Loaded File: train_iris.csv Total rows in the file: 99 Total columns in the file: 5 Weighted variable : -1 counts: 0 Int Id variable : -1 str id: -1 counts: 0 Target Variables : 1 values : [0] Actual columns number : 4 Number of Skipped rows : 0 Actual Rows (removing the skipped ones) : 99 Loaded dense train data with 99 and columns 4 loaded data in : 0.077000 Level: 1 dimensionality: 3 Starting cross validation Fitting model: 1 logloss : 0.8122057659057356 Done with fold: 1/5 Fitting model: 1 logloss : 0.6068836011804708 Done with fold: 2/5 Fitting model: 1 logloss : 1.0090491625257505 Done with fold: 3/5 Fitting model: 1 logloss : 0.7164653252249316 Done with fold: 4/5 Fitting model: 1 logloss : 0.7112362916009889 Done with fold: 5/5 Average of all folds model 0 : 0.7711680292875756 Level: 1 start output modelling Fitting model : 1 Completed level: 1 out of 3 Level: 2 dimensionality: 3 Starting cross validation Fitting model: 1 Exception in thread "Thread-8976" java.lang.IllegalStateException: failed to create Xgboost subprocess with config name C:\Users\zhaod\StackNet\example\iris\models\4gianleib1ibcd3cfqevou6vja.conf at ml.xgboost.XgboostClassifier.create_xg_suprocess(XgboostClassifier.java:359) at ml.xgboost.XgboostClassifier.fit(XgboostClassifier.java:1488) at ml.xgboost.XgboostClassifier.run(XgboostClassifier.java:480) at java.lang.Thread.run(Unknown Source) Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: Tree is not fitted at ml.Bagging.scoringhelpercatbagv2.<init>(scoringhelpercatbagv2.java:89) at ml.Bagging.BaggingClassifier.predict_proba(BaggingClassifier.java:398) at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2950) at stacknetrun.runstacknet.main(runstacknet.java:471) ... 5 more
@ahbon123
Please see instructions on how to install xgboost properly inside StackNet here:
and the for lightgbm, it should be similar:
i installed xgboost to lib/win using git clone xgboost, it still doesn't work, i think maybe i should follow the steps here
what does the executable print if you try to execute it from the command line ? so if you just cd to lib/win/xg
and you just type xgboost.exe
inside, what does it say?
Also please confirm you are using the lib/
folder from the repository, it is the first time seeing not working in windows.
https://github.com/kaz-Anova/StackNet/tree/master/lib/win/xg
C:\Users\zhaod\StackNet\lib\win\xg>xgboost.exe Usage: <config>
Then xgboost should run fine. Please do:
java -jar StackNet.jar
in the same folder where the lib/
folder is. Both the .jar
and the lib/
folder need to be in the same place. verbose:true
to the xgboost model inside the params file and lets see what it prints when you run it. 1. run java -jar StackNet.jar in lib, result: `C:\Users\User\StackNet\lib>java -jar StackNet.jar 'train' or 'predict' : to train or predict 'sparse' : true if the data to be imported are in sparse format (libsvm) or dense (false) 'task' : could be either 'regression' or 'classification'. 'has_head' : true if train_file and test_file have headers else false 'model' : name of the output model file. 'output_name' : prefix of the models to be printed per iteration. this is to allows the meta features of each iterations to be printed. defaults to nothing. 'indices_name' : suffix for the names of kfold indices to be printed as .csvs . It will print as many files as the selected kfold with names [indices_name][fold_number].csv . It will have the format of 'index,[0 if training else 1]' 'pred_file' : name of the output prediction file. 'data_prefix' : prefix to be used when the user supplies own pairs of [X_train,Xcv] datasets for each fold as well as an X file for the whole training data. Each train/valid pair is identified by prefix'train'[fold_index_starting_fromzero]'.txt'/prefix'cv'[fold_index_starting_fromzero]'.txt' and prefix'train.txt' for the final set. For example if prefix='mystack' and folds=2 then stacknet is expecting 2 pairs of train/cv files. e.g [[mystack_train0.txt,mystack_cv0.txt],[mystack_train1.txt,mystack_cv1.txt]]. It also expects a [mystack_train.txt] for the final train set 'train_file' : name of the training file. 'test_file' : name of the test file. 'test_target' : true if the test file has a target variable in the beginning (left) else false (only predictors in the file). 'params' : parameter file where each line is a model. empty lines correspond to the creation of new levels 'verbose' : true if we need StackNet to output its progress else false 'threads' : number of models to run in parallel. This is independent of any extra threads allocated from the selected algorithms. e.g. it is possible to run 4 models in parallel where one is a randomforest that runs on 10 threads (it selected). 'metric' : Metric to output in cross validation for each model-neuron. can be logloss, accuracy or auc (for binary only) for classification and rmse ,rsquared or mae for regerssion .defaults to 'logloss' for classification and 'rmse' for regression 'stackdata' :true for restacking else false 'seed' : integer for randomised procedures 'folds' : number of folds for re-usable kfold 'bins' : A parameter that allows classifiers to be used in regression problems. It first bins (digitises) the target variable and then runs classifiers on the transformed variable. Defaults to 2
example of parameter file :
LogisticRegression C:1 Type:Liblinear maxim_Iteration:100 scale:true verbose:false RandomForestClassifier bootsrap:false estimators:100 threads:5 logit.offset:0.00001 verbose:false cut_off_subsample:1.0 feature_subselection:1.0 gamma:0.00001 max_depth:8 max_features:0.25 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1 GradientBoostingForestClassifier estimators:100 threads: offset:0.00001 verbose:false trees:1 rounding:2 shrinkage:0.05 cut_off_subsample:1.0 feature_subselection:0.8 gamma:0.00001 max_depth:8 max_features:1.0 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:RMSE row_subsample:0.9 seed:1 Vanilla2hnnclassifier UseConstant:true usescale:true seed:1 Type:SGD maxim_Iteration:50 C:0.000001 learn_rate:0.009 smooth:0.02 h1:30 h2:20 connection_nonlinearity:Relu init_values:0.02 LSVC Type:Liblinear threads:1 C:1.0 maxim_Iteration:100 seed:1 LibFmClassifier lfeatures:3 init_values:0.035 smooth:0.05 learn_rate:0.1 threads:1 C:0.00001 maxim_Iteration:15 seed:1 NaiveBayesClassifier usescale:true threads:1 Shrinkage:0.1 seed:1 verbose:false
RandomForestClassifier estimators=1000 rounding:3 threads:4 max_depth:6 max_features:0.6 min_leaf:2.0 Objective:ENTROPY gamma:0.000001 row_subsample:1.0 verbose:false copy=false`
2. i changed verbose: false to verbose: true, same results:
Exception in thread "Thread-8976" java.lang.IllegalStateException: failed to create Xgboost subprocess with config name C:\Users\User\StackNet\example\iris\models\eeo8v2jt3mrmbh03egndo0b6km.conf at ml.xgboost.XgboostClassifier.create_xg_suprocess(XgboostClassifier.java:359) at ml.xgboost.XgboostClassifier.fit(XgboostClassifier.java:1488) at ml.xgboost.XgboostClassifier.run(XgboostClassifier.java:480) at java.lang.Thread.run(Unknown Source) Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: Tree is not fitted at ml.Bagging.scoringhelpercatbagv2.<init>(scoringhelpercatbagv2.java:89) at ml.Bagging.BaggingClassifier.predict_proba(BaggingClassifier.java:398) at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2950) at stacknetrun.runstacknet.main(runstacknet.java:471) ... 5 more
i think bugs occurred because of installation of xgboost.
No, I was not clear Stacknet.jar needs to be in the same folder as lib/
, not inside lib/
.
(and the lib folder needs to be exactly as you downloaded it from the git - no other changes or additions)
Please see this.
this is all you need to run it. Put all files in the same directory, ensure .jar
and lib/
folders are present.
it works now, thanks a lot! @kaz-Anova
I've got working it on iris data , also . In further several days I am going to test it in real Kaggle Competition and so we can close this issue , I think
Any another clue where the error and how to fix it ?