kaz-Anova / StackNet

StackNet is a computational, scalable and analytical Meta modelling framework
MIT License
1.32k stars 344 forks source link

Error while replicating example #19

Closed arisbw closed 7 years ago

arisbw commented 7 years ago

Hi, I tried to replicate one of the example that provided in this repo. In this case, I tried to replicate the Amazon one. I ran the code using param_amazon_linear like the one documented in that example, but all I got was this:

> java -Xmx3048m -jar StackNet.jar train train_file=train.sparse test_file=test.sparse params=param_amazon_linear.txt pred_file=amazon_linear_pred.csv test_target=false verbose=true Threads=1 sparse=true folds=5 seed=1 metric=auc
parameter name : train_file value :  train.sparse
parameter name : test_file value :  test.sparse
parameter name : params value :  param_amazon_linear.txt
parameter name : pred_file value :  amazon_linear_pred.csv
parameter name : test_target value :  false
parameter name : verbose value :  true
parameter name : threads value :  1
parameter name : sparse value :  true
parameter name : folds value :  5
parameter name : seed value :  1
parameter name : metric value :  auc
a train method needs to have a task which may be regression or classification

After I checked for awhile, it didn't produce any output file. Is there something that I did wrong?

Additional note: I also already produced train.sparse and test.sparse by running prepare_data.py

goldentom42 commented 7 years ago

Hello arisbw,

Could you try with the following command that includes the task parameter:
java -Xmx3048m -jar StackNet.jar train task=classification train_file=train.sparse test_file=test.sparse params=param_amazon_linear.txt pred_file=amazon_linear_pred.csv test_target=false verbose=true Threads=1 sparse=true folds=5 seed=1 metric=auc

I think that in previous version the task parameter was not mandatory. Let me know if this solves the issue.

arisbw commented 7 years ago

Ah thank you! OK noted, so now I need to declare task each time I run the model. But I am curious why using "train task" instead of "train_task"? Could you explain why you add "train" before "task"? Since in later parameter I already declared train and test file.

goldentom42 commented 7 years ago

Hi arisbw, happy the example now works for you ;-)

train is a StackNet argument telling the program to train the models contained in param_amazon_linear.txt and then make predictions (if pred_file argument is valued) Once the models are trained and saved on disk you can use them by calling stackent with the predict argument.

Task is just another argument, which can take 2 values, namely classification or regression. It's to make sure the last StackNet model and metric are inline with the task you want to achieve. Hope this clarifies things.

You can take a look at this section of the readme file for more information.

arisbw commented 7 years ago

Thank you again for your reply. Now I also learned that now StackNet supports csv file. From that documentation, I just need to make the target variable as my first column, following other features. Is that right?

kaz-Anova commented 7 years ago

Thank you @goldentom42 , these are great answers.

goldentom42 commented 7 years ago

@Kaz-Anova, you're welcome ;-) StackNet is a very powerful lib and reading your code is always a great source of inspiration. @arisbw, that's right StackNet supports any coma separated text file as long as the first column is your target for the train file. if the test file does not contain a target then test_target argument should be set to false.

kaz-Anova commented 7 years ago

I have added task=classification to the example. I have not updated StackNet for a while but I am working in the background to allow support for sklearn and keras models.

goldentom42 commented 7 years ago

@kaz-Anova, while reproducing @arisbw's problem I found 4 print statements in prepare_data.py that are missing parenthesis. If you want to make the fix they are on lines 410, 411, 421 and 514.

arisbw commented 7 years ago

@goldentom42 Again thank you for your reply. 👍 Btw I think the code missing parenthesis because it's supposed to run with Python 2.7 @kaz-Anova Cool! Can't wait for the update.