yzhao062 / SUOD

(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)
https://www.andrew.cmu.edu/user/yuezhao2/papers/20-preprint-suod.pdf
BSD 2-Clause "Simplified" License
373 stars 49 forks source link

Incompatability with Sklearn (and PyOD) #14

Open jonnyhof opened 5 months ago

jonnyhof commented 5 months ago

SUOD with PyOD does not function. An issue within sklearn/base prevents SUOD().fit() from working.

I have created several conda environments to try to resolve the sklearn compatibility issue with no luck. An environment with this issue can be created easily from a new env that only specifies PyOD and SUOD as dependencies. Here versions of Sklearn and other deps are set by conda, but I have also manually specified the versions listed in in SUOD and PyOD docs, but to no avail. The .yml file I have used in this example is (note my only hard requirement for this project is python 3.11):

name: pyod_suod_env
channels:
  - conda-forge
dependencies:
  - python>=3.11
  - pyod
  - pip
  - pip:
    - suod

To reproduce the error, all you need to do is call the suod fit method. Code to reproduce:

# Import packages
from pyod.models.suod import SUOD
# from suod.models.base import SUOD
from pyod.utils.data import generate_data

# Generate data
contamination = 0.1 
n_train = 200 
n_test = 100 

X_train, X_test, y_train, y_test = generate_data(
    n_train=n_train, n_test=n_test, contamination=contamination)

# Fit SUOD
od = SUOD(
    n_jobs=2,
    combination='average',
    verbose=True,
)
od.fit(X_train)

Note that I have tried this above code with both pyod.models.suod.SUOD and suod.models.base.SUOD with the same result.

The entire resulting error trace is as follows:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], [line 1](vscode-notebook-cell:?execution_count=10&line=1)
----> [1](vscode-notebook-cell:?execution_count=10&line=1) od.fit(X_train)
      [2](vscode-notebook-cell:?execution_count=10&line=2) train_pred = od.labels_
      [3](vscode-notebook-cell:?execution_count=10&line=3) train_scores = od.decision_scores_

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/pyod/models/suod.py:210](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/pyod/models/suod.py:210), in SUOD.fit(self, X, y)
    [207](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/pyod/models/suod.py:207) self._set_n_classes(y)
    [209](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/pyod/models/suod.py:209) # fit the model and then approximate it
--> [210](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/pyod/models/suod.py:210) self.model_.fit(X)
    [211](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/pyod/models/suod.py:211) self.model_.approximate(X)
    [213](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/pyod/models/suod.py:213) # get the decision scores from each base estimators

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:308](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:308), in SUOD.fit(self, X)
    [304](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:304) if self.bps_flag:
    [305](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:305)     # load the pre-trained cost predictor to forecast the train cost
    [306](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:306)     cost_predictor = load_predictor_train(self.cost_forecast_loc_fit)
--> [308](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:308)     print(cost_predictor)
    [309](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:309)     time_cost_pred = cost_forecast_meta(cost_predictor, X,
    [310](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:310)                                         self.base_estimator_names)
    [312](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/suod/models/base.py:312)     # use BPS

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:315](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:315), in BaseEstimator.__repr__(self, N_CHAR_MAX)
    [307](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:307) # use ellipsis for sequences with a lot of elements
    [308](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:308) pp = _EstimatorPrettyPrinter(
    [309](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:309)     compact=True,
    [310](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:310)     indent=1,
    [311](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:311)     indent_at_name=True,
    [312](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:312)     n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW,
    [313](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:313) )
--> [315](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:315) repr_ = pp.pformat(self)
    [317](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:317) # Use bruteforce ellipsis when there are a lot of non-blank characters
    [318](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:318) n_nonblank = len("".join(repr_.split()))

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:158](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:158), in PrettyPrinter.pformat(self, object)
    [156](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:156) def pformat(self, object):
    [157](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:157)     sio = _StringIO()
--> [158](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:158)     self._format(object, sio, 0, 0, {}, 0)
    [159](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:159)     return sio.getvalue()

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:175](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:175), in PrettyPrinter._format(self, object, stream, indent, allowance, context, level)
    [173](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:173)     self._readable = False
    [174](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:174)     return
--> [175](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:175) rep = self._repr(object, context, level)
    [176](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:176) max_width = self._width - indent - allowance
    [177](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:177) if len(rep) > max_width:

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:455](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:455), in PrettyPrinter._repr(self, object, context, level)
    [454](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:454) def _repr(self, object, context, level):
--> [455](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:455)     repr, readable, recursive = self.format(object, context.copy(),
    [456](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:456)                                             self._depth, level)
    [457](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:457)     if not readable:
    [458](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/pprint.py:458)         self._readable = False

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:189](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:189), in _EstimatorPrettyPrinter.format(self, object, context, maxlevels, level)
    [188](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:188) def format(self, object, context, maxlevels, level):
--> [189](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:189)     return _safe_repr(
    [190](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:190)         object, context, maxlevels, level, changed_only=self._changed_only
    [191](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:191)     )

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:440](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:440), in _safe_repr(object, context, maxlevels, level, changed_only)
    [438](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:438) recursive = False
    [439](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:439) if changed_only:
--> [440](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:440)     params = _changed_params(object)
    [441](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:441) else:
    [442](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:442)     params = object.get_params(deep=False)

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:93](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:93), in _changed_params(estimator)
     [89](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:89) def _changed_params(estimator):
     [90](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:90)     """Return dict (param_name: value) of parameters that were given to
     [91](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:91)     estimator with non-default values."""
---> [93](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:93)     params = estimator.get_params(deep=False)
     [94](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:94)     init_func = getattr(estimator.__init__, "deprecated_original", estimator.__init__)
     [95](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/utils/_pprint.py:95)     init_params = inspect.signature(init_func).parameters

File [~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:244](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:244), in BaseEstimator.get_params(self, deep)
    [242](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:242) out = dict()
    [243](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:243) for key in self._get_param_names():
--> [244](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:244)     value = getattr(self, key)
    [245](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:245)     if deep and hasattr(value, "get_params") and not isinstance(value, type):
    [246](https://file+.vscode-resource.vscode-cdn.net/home/jonny/code/local/nv_outlier_detection/~/miniforge3/envs/pyod_suod_env/lib/python3.11/site-packages/sklearn/base.py:246)         deep_items = value.get_params().items()

AttributeError: 'RandomForestRegressor' object has no attribute 'monotonic_cst'
a-kole commented 5 months ago

Hi, had the same problem, solved jumping back to scikit-learn 1.3.1 which should be compatible (also 1.3.2 should work but I haven't tried yet). It should temporarly fix your issue in the meantime.

jonnyhof commented 5 months ago

Hi @a-kole , thanks for the suggestion!

yzhao062 commented 5 months ago

it caused by sklearn update -- the latest version 0.1.2 should fix it.