RFC: Added final confussion matrix and plot with seaborn

kfern commented 4 years ago

Hi!

I am using only 50% of the data for the training phase. I also add the confusion matrix at the end with the whole dataset in values and graphically

pplonski commented 4 years ago

Thank you @kfern, nice work! Few comments:

the iris dataset is very small, I'd recommend increasing the split to 75%
please compute confusion matrix on test samples only
maybe add some exploratory analysis before building AutoML models?

If you are interested, it will be nice to add more challenging examples maybe with data from openML.org or kaggle.com? This one can be good https://www.kaggle.com/c/GiveMeSomeCredit

kfern commented 4 years ago

@pplonski Thank you for your comments.

the iris dataset is very small, I'd recommend increasing the split to 75%:

With 75% I get one exception warning, so I have changed it to 70%

3_Linear final logloss 0.1322738103767312 time 11.58 seconds Exception while producing SHAP explanations. Additivity check failed in TreeExplainer! Please ensure the data matrix you passed to the explainer is the same shape that the model was trained on. If your data shape is correct then please report this on GitHub. Consider retrying with the feature_perturbation='interventional' option. This check failed because for one of the samples the sum of the SHAP values was 0.059797, while the model output was -2414693454883847634655657146075107653118219264834073905539784541407674295729150557871573451714877692228141513779075819215729150300238764889197953397579737715437124210346299012179540720973927733770283959469960821427142656.000000. If this difference is acceptable you can set check_additivity=False to disable this check. Continuing ...

please compute confusion matrix on test samples only:

Done

maybe add some exploratory analysis before building AutoML models?

I thought I would do it in another process, once I have the automated results. With this information it is easy to know where to continue.

If you are interested, it will be nice to add more challenging examples maybe with data from openML.org or kaggle.com? This one can be good https://www.kaggle.com/c/GiveMeSomeCredit

It is interesting. I will look at it

pplonski commented 4 years ago

@kfern we have cross-posted almost in the same time :)

In the example, the AutoML_1 directory has results from the Jupyter notebook. Please update the AutoML_1 directory. To do this:

Remove the `AutoML_11 directory.
Run the notebook
Commit changes.

kfern commented 4 years ago

@pplonski :-)

This PR only change the jupiter notebook

pplonski commented 4 years ago

Please update AutoML_1 directory, if possible. If not, I will update it next week. Thanks for help!

kfern commented 4 years ago

Sorry. My bad. I have also uploaded AutoML_1

pplonski commented 4 years ago

Thank you @kfern :tada:

mljar / mljar-examples

RFC: Added final confussion matrix and plot with seaborn #2