tvdboom / ATOM

Automated Tool for Optimized Modelling
https://tvdboom.github.io/ATOM/
MIT License
152 stars 14 forks source link

Multilabel Classification method give the same results #47

Open DariuszMajerek opened 8 months ago

DariuszMajerek commented 8 months ago

Contribution guidelines

Description

I've tried to compare three methods of Multilabel Classification by Random Forest. I wanted to check wich method will be the best MultiOutputClassifier, ClassifierChain or native multilabel RandomForestClassifier. To my surprise, all the results were identical. What is wrong then, since when I do the same calculations using sklearn I get different results. Could you help me.

test.pdf

Expected behaviour

No response

Actual behaviour

No response

Steps to reproduce

No response

Python and package version

tvdboom commented 8 months ago

What version of atom are you using? That functionality was deprecated in 5.1.0 I believe. I see that the documentation was not updated accordingly, sorry for that. In the latest version the multioutput meta-estimator is assigned by default. Doing atom.multioutput = ... doesn't do anything. So the same results make sense because you are using the same estimator (check it printing atom.rf.estimator). So you can either downgrade to the previous version or you can assign the three estimators directly to the run method (that way you also have all three models in the same atom instance).

atom.run(["RF", MultiOutputClassifier(RandomForestClassifier()), ClassifierChain(RandomForestClassifier())])
dax44 commented 8 months ago

My version is 5.2.0. Unfortunately your example don't work for me. When I use your command, I've got:

Training ========================= >>
Models: RF, MOC, CC
Metric: average_precision

Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> average_precision: 1.0
Test evaluation --> average_precision: 0.6468
Time elapsed: 0.155s
-------------------------------------------------
Total time: 0.155s

Results for MultiOutputClassifier:
Fit ---------------------------------------------

Exception encountered while running the MOC model.
TypeError: MultiOutputClassifier.__init__() got an unexpected keyword argument 'estimator__bootstrap'

Results for ClassifierChain:
Fit ---------------------------------------------

Exception encountered while running the CC model.
TypeError: _BaseChain.__init__() got an unexpected keyword argument 'base_estimator__bootstrap'

Final results ==================== >>
Total time: 0.160s
-------------------------------------
RandomForest --> average_precision: 0.6468 ~
Consecutive runs of model RF. The former model has been overwritten.
tvdboom commented 8 months ago

I made a mistake. You have to specify in the custom model that the class doesn't need a multilabel wrapper.

from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import ClassifierChain, MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from atom import ATOMClassifier, ATOMModel

X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)

atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)

chain = ATOMModel(ClassifierChain(RandomForestClassifier()), native_multilabel=True)
multi = ATOMModel(MultiOutputClassifier(RandomForestClassifier()), native_multilabel=True)

atom.run(["rf", chain, multi])
dax44 commented 8 months ago

Thanks for quick replay. Unfortunately this still don't work. There is no native_multilabel parameter in ATOMModel module. I've the following error:

TypeError: ATOMModel() got an unexpected keyword argument 'native_multilabel'
tvdboom commented 8 months ago

you are right. that's functionality of the development branch, not yet released. The dev branch also contains a fix for the error you showed before (TypeError: _BaseChain.__init__() got an unexpected keyword argument 'base_estimator__bootstrap').

You can install atom from that branch using pip install git+https://github.com/tvdboom/ATOM.git@development. Then it should work.

dax44 commented 8 months ago

Yes, it works :) Thank you for your help.