kaz-Anova / StackNet

StackNet is a computational, scalable and analytical Meta modelling framework
MIT License
1.32k stars 344 forks source link

Questions: Why stacknet performance bad in Zillow Competition #47

Closed iFe1er closed 7 years ago

iFe1er commented 7 years ago

Hi dear stacknet developers and the author Mr.@kaz-Anova

I am new to kaggle, and i found stacknet a COOL tool to use. However it is not as GOOD as i expected using the default parameter.... I tried tuning/adding features.. but none of them give me any improvment yet. So that's why i post this passage, hopefully someone can guide me how to tune a BETTER stacknet (paramaters, folds, regressors, etc) for Zillow Competition.

As far as the doc said: https://github.com/kaz-Anova/StackNet/tree/master/example/zillow_regression_sparse

the performance of StackeNet alone ONLY achieve around 0.0647 on LB, which is not good, comparing with other kernels. Some kernels can achieve a high score with a single model (lightgbm alone up to 0.0644). And there are also Kernels that use a 2-layer traditional-way stacking that achieve 0.0645 (https://www.kaggle.com/wangsg/ensemble-stacking-lb-0-644/comments), which is much better than stacknet as far as the LeaderBoard concerned.

So my question is that, why stacknet is not working very well on LB in Zillow competition?

1.Is it the problem of default parameter or regressors?

If so, could @kaz-Anova please help update the paramater and regressors in the example so that the performance can gets better? (i tried a lot, but not imporved) . If so, more people (especially freshman like me) will be more happy to use StackNet.

2. Is it a problem of reusable K-fold metrics?

( As far as i tried, my 5-fold LightGBM average works much worse than my 1-time Lightgbm.) In my opinion, the Zillow LeaderBoard test data is evaluated on 2016.10 ~ 2016.12 . However, the data between 2016.10~2016.12 are very few in the training set. So , a K-FOLD may be is a bad ways to do the competition.
If so, would it be possible that StackedNet will in the future support a DIFFERENT out-of-bag metrics, not just KFOLD, so that more flexible blending(i.e. devided data into two parts. Then only use history data to train, predict on future data, and use the future data to do stacking) or sliding window algorithm would be supported( you know, especially for time-related problem, sometimes it is bad to leak the future into the past with K-fold or reusable K-fold)

3.Some other questions about the Zillow Competition using StackNet.

to Mr.@kaz-Anova: Sincerely congrats on the high score that you and your team achieve. However, considering the bad performance of Stacknet now, I am very curious if you are still using the StackNet in the competition as a strong predictor (instead as a weak model for averaging..)

Apologize for my rudeness (if that is the case) and surely I know that one can achieve better LB scoring just by combing with other kernels... .But my question is, the baseline of Stackenet now is so far away from other kernels. Are there any practical methods(or tricks) that you would like to share in order to make stacknet works better?

p.s. I am now a fan of stacknet, i want to express gratitude to Mr.@kaz-Anova for the converience that powerful stacknet brings us. I wish it could be even BETTER in the future.

Sincerely

kaz-Anova commented 7 years ago

@iFe1er

Thank you for the detailed feedback.

  1. I did not tune the parameters for long . I only gave this example as a starting point . It is not in the spirit of the competition to put very good parameters - people should try themselves to play with them and try and get a better score. I thought I put enough to make people want to try it. However you are right that the default parameters of StackNet are in principle not very good - I have not done much work in this area as I personally like to tune the models myself. At the same time, better parameters do not always mean better stacking. You are indeed looking for good models , BUT you are also looking for diversity and in many occasions even bad models add value. Nothing stops you from creating the same data and run the same lightgbm through StackNet. If you feel more comfortable with python , you could use the pythongeneric module to add your own models. I will not change the parameters at this point. Also there is an element of over-fitting for the public scripts. I believe you underestimate (and therefore you need to take into account) that the public scripts have been tuned heavily on the public leader board and are likely not to generalise well in new data.

  2. You can add your own folds. Look here for data_prefix. You basically create your own sets of [train,validation] data that you supply to StackNet. There is an example using it here . To answer your question though, yes in the future there will be more options out of the box.

  3. Performance of StackNet even in the example I shared is not bad and as you see it blends well with other approaches. I cannot say what my team uses right now because I am not alone in this . For sure it does not use only one approach.

You could try adding all these kernels into stacknet (as in create the data files used in the kernels and make new stacks) and run multiple of them.

As general tips, you need to create modelling diversity with StackNet. There are 2 main ways to generate this :

Diversity based on algorithms:

Diversity based on input data via making different transformations :

The tuning may take time . I normally create a smaller training set and try to find good parameters changing one at a time. I do this for multiple algorithms one by one. Once I finish with one, I move on to the next. I do this until I create many , then I put them all together and run again with the full data.

Hope this helps a bit.

iFe1er commented 7 years ago

@kaz-Anova Thank you very much for the very detailed, inspiring and super helpful answers. I will give stacknet a shot and try some different methods that you generously share and work harder in the competition.

And you are right, as a freshman i might have indeed under-estimated the performance and the flexibility of stacknet. And i agree on your saying that spirit of stacknet, is to always think out of the box and try new things, and dig more interesting facts out of the data.

Thank you again for your generous help and enlightening answers, wish you good luck with the competition! (and wish for myself too, haha!)

cks1001652 commented 7 years ago

@kaz-Anova follow up on the data_prefix part. In the example of the amazon competition, you wrote

The second part will use the data per fold 5. Execute from the command line : java -Xmx3048m -jar StackNet.jar train task=classification data_prefix=amazon_counts test_file=amazon_counts_test.txt pred_file=amazon_count_pred.csv verbose=true Threads=1 folds=5 seed=1 metric=auc

I see you set up the prefix, but you did not specify the pairs of train/cv data sets. How does this work exactly? Should i set the train_file=[[train_0.txt,cv0.txt],[train_1.txt,cv1.txt],[train.txt]] for specifying the pairs of datastet or is there other ways to approach this?

Thanks a lot for this great tool!

kaz-Anova commented 7 years ago

hI @cks1001652 . It is exactly as you pointed out. You need to add the indices at the of the files yourself (e.g. 0 or 1 or more if folds>2)

ajing commented 7 years ago

Could I ask two more detailed questions?

  1. If you have multiple important parameters for one model, will you do a grid search or search by experience? How do you know the hyper-parameter is good enough and go to another model?
  2. For multiple layer StackNet, how do you tune models on the second or even higher layer?