Closed Tonywhitemin closed 2 years ago
Ensemble is the set of models (not stacked). Model can be stacked if learned with predictions from previous models (not stacked) . Stacked ensemble is set of models stacked and not stacked (all available).
The repeat
is the weight of the model in the ensemble.
Did you get good results with AutoML?
Hi pplonski Thanks for your quick reply!
For stack models, I still want to check how will it stack? For example, following picture shows that it use five model to stack : original_LightGBM, original_Xgboost, original_Neural Network, original_Random Forest and original_Extra Trees.
Do they stack sequentially as picture below?
Another question is about weight, the graph below shows that the sum of weights is 23+21+1+7+1+10+40=103. Is this normal? (more than 100%). Or am I misunderstanding the rules?
So far I just use the sample code to practice this AutoML tool, the result is good to me. I will use it on the medical dataset when I understand this tool better. Thanks again for your kindness!
Hi @Tonywhitemin!
In the first picture you selected models that were trained with the Optuna framework.
Here is a definition of stacking https://en.wikipedia.org/wiki/Ensemble_learning#Stacking - in MLJAR AutoML the best models are selected and their predictions are added to the original data. Such data is used to train stacked models.
In the ensemble there are weights. After summing all predictions in the ensemble they are normalized with the total weight sum. So it can be any total value.
I'm happy to help. Good luck with your data.
Hi @pplonski , Thanks for your help! As you said, "in MLJAR AutoML the best models are selected and their predictions are added to the original data. Such data is used to train stacked models." So the five models listed in the picture below were "the best models" as you mentioned, is that correct? If above is correct, what will the final "stacked models" algorithms be?
Can you post the full framework.json
file? There might be more models stacked, for example with golden features.
There should be information in framework.json
file about which algorithms are used for stacking.
Hi @pplonski, framework.txt.txt Please refer to attachment from the path of "Optuna_extratrees_stacked/framework.json" If there are some information indicated in this file and I missed, tell me please. Thanks for your time!
There should be params.json
or framework.json
file in the main directory. There should be a separate file with info about golden features as well. Could you send them? Thanks!
Hi @pplonski Following attachment is the params.json file showed as picture below for your reference. (BTW, in optuna mode, it doesn't have golden_feature.json file in it) params.txt.txt
Part of the file infomation showed as below, but still can't understand the structure for the final stacked model... Could you help with that? Thanks!
Here is how the data for stacked models looks like (based on your params):
Original input data plus predictions from previous models are concatenated and form a new input vector.
Thanks for your picture, @pplonski ! I would like to check if the stacked model explainable? If yes, what algorithm will the final stacked model select?
The stacked can be explainable but it is not implemented in MLJAR AutoML.
I don't understand the second question.
Sorry for the confusion... The second question is that , for example: Following picture is a reference journal from link of: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205872 They also used stack model level 1,and they chose support vector machine as the level-1 model algorithm. Could we know the level-1 stacked model's algorithm in MLJAR-AutoML?
There can be several different algorithms at Level-1
(from the image above). For example:
1_Optuna_LightGBM_Stacked
- means that LightGBM
algorithm was trained with stacked data,5_Optuna_ExtraTrees_Stacked
- means that ExtraTree
algorithm was trained with stacked data,
so you have several models at Level-1
.Then you have the next level Level-2
(nor in the image) - it is Ensemble_Stacked
it ensembles all available models from Level-0
and Level-1
.
I really appreciate your help, @pplonski ! Now I understand! Thanks for your time!
@Tonywhitemin would you like to help improve MLJAR AutoML docs? The docs are here https://github.com/mljar/docs and are written in Markdown. There might be a separate page about stacking, what do you think? :)
If you don't mind my English expression, I would like to give it a try! Could you tell me how can I help? :)
Let's create a page in docs How does ensemble stacking work?
. You can describe there how models at all levels are trained. You can start by creating the fork of the docs and working on your local copy of docs. When you will be ready, then you will do PR (pull request) and I will review your work (maybe add something). If all will be good then I will deploy a new version of docs to the server.
Got it! I will try, thanks!
Hi @pplonski I upload the file to link: https://github.com/Tonywhitemin/docs Please help to check if there have any mistake, thanks!
Thank you @Tonywhitemin! The description is good! I have two coding comments:
ensemble-stacking.md
docs/images/
and use them as
![image description](/docs/images/clip_image002.gif)
Are you the author of the images?
Thanks @pplonski ! I modified the image path and re-upload the markdown file as you mentioned. Please check if it has been fixed, Thanks! By the way, the image there in image folder were from the results I ran and the process flow image is made by myself.
Thank you @Tonywhitemin!
I've made small fixes in your docs (you can check them here https://github.com/mljar/docs/commit/2ca0391fde8af5ce39ad21ac19a85c8eb9f7ec15 and https://github.com/mljar/docs/commit/e19d68ae5157f88ca9af5435fe51669a18a0e1f6)
Your docs is already in the server https://supervised.mljar.com/features/stacking-ensemble/
I'm glad I can put some effort into this nice tool, thank you so much @pplonski!
Hi @pplonski I read the section of modes at this link: https://supervised.mljar.com/features/modes/ The "total models tuned for each algorithm" showed as below image.
The numbers here seens reasonable. But in the section of "Custom modes" shows unstacked models number with 10+3x3x2=28... May I ask why it need to be multiply by 2? Thank you!
For hill climbing it tries to train 2 new models from previous models in each hill climbing step.
Hill climbing algorithm:
n
top algorithms according to the metric value,Got it! But if base on the calculate method, should the numbers showed below be modified?
You are right! Would you fix this in the docs?
Hi @pplonski, I edited the doc and pulled request as below, could you check if it is OK? Thanks!
it's ok @Tonywhitemin - thank you!
Good day! I am reading your manual now but can't tell the model structure differences between ensemble/stacked/ensemble_stacked...
Following pictures are json files from the example code and the questions are listed below, could you please help to answer them?
ensemble.json
Optuna_extratrees_stacked/framework.json
Ensemble_stacked/ensemble.json
Best regards