yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.42k stars 1.36k forks source link

difference between XGBOD and XGBClassifier #209

Open abhijeetmote opened 4 years ago

abhijeetmote commented 4 years ago

Hi @yzhao062 ,

I have seen the code of XGBOD, it is using XGBClassifier internally(wrapper written using it) and the same default parameters. So what would be the difference if I use XGBOD over XGBClassifier?

I have seen when I use XGBOD the model size is more compare to XGBClassifier, what is happening internally in XGBOD

your inputs are appreciated. Thanks, Abhijeet

yzhao062 commented 4 years ago

The paper is here: https://arxiv.org/abs/1912.00290

To be short, XGBOD is an extended version of xgboost but using unsupervised models to enrich the feature space to achieve a better result. In the paper itself, xgboost's performance is also compared.

abhijeetmote commented 4 years ago

Thanks @yzhao062 for your input. If I see the object of XGBClassifier and XGBOD it shows the following parameters tuned. So bu tuning the hyperparameters it is considered as an unsupervised model. Correct me if I am wrong?

XGBClassifier HyperParameters XGBOD
None base_score 0.5
None booster gbtree
None colsample_bylevel 1
None colsample_bynode None
None colsample_bytree None
None gamma 0
None gpu_id None
'gain' importance_type None
None interaction_constraints None
None learning_rate 0.1
None max_delta_step 0
None max_depth 3
None min_child_weight 1
nan missing None
None monotone_constraints None
100 n_estimators 100
None n_jobs 1
None num_parallel_tree None
'binary:logistic' objective 'binary:logistic'
None random_state 0
None reg_alpha 0
None reg_lambda 1
None scale_pos_weight 1
None subsample 1
None tree_method None
None validate_parameters None
None) verbosity None

Internally anything else is changed or if I just use XGBClassfier with tuned parameters it should work same as XGBOD

Thanks, Abhijeet

abhijeetmote commented 4 years ago

@yzhao062, Actually the problem is when I use XGBOD it's giving the accuracy very well. but If I try to train using spark multiprocessing it is getting failed. Giving memory heap error.

On the other hand XGBClassifier is working with spark multiprocessing but accuracy sometimes is not that good.

what do you suggest to overcome the issue? Thanks, Abhijeet