yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.56k stars 1.37k forks source link

Pickle error when using deep learning based models with SUOD #336

Open pbosch opened 3 years ago

pbosch commented 3 years ago

Environment: WSL2 with Conda (Python 3.8.10) Error:

joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 356, in _sendback_result
    result_queue.put(_ResultItem(work_id, result=result,
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/backend/queues.py", line 241, in put
    obj = dumps(obj, reducers=self._reducers)
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.RLock' object
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "suod_test.py", line 51, in <module>
    clf.fit(X_train)
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/pyod/models/suod.py", line 210, in fit
    self.model_.fit(X)
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/suod/models/base.py", line 290, in fit
    all_results = Parallel(n_jobs=n_jobs, max_nbytes=None, verbose=True)(
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in __call__
    self.retrieve()
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
TypeError: cannot pickle '_thread.RLock' object

How to reproduce: Add AutoEncoder or DeepSVDD to list of detectors in base SUOD example (DeepSVDD(hidden_neurons=[2, 1])).

Just for completeness sake, using AutoEncoder or DeepSVDD alone works perfectly fine.

yzhao062 commented 3 years ago

Thanks for the note. I forget to mention in the documentation that the deep learning model will not benefit from SUOD acceleration... The bottleneck of deep learning training is mainly the accessibility to GPUs...Sorry for the confusion.

pbosch commented 3 years ago

In that case, what is the best approach to benefit from the acceleration for non deep learning models but being able to combine them with deep learning models? SUOD has that mechanism build in, which is handy. It would also be possible to train each model separately and then use combination. But is there a way to have both?

yzhao062 commented 3 years ago

This is a great point! I think the combination comes from two perspectives. First, you may consider using deep learning models as feature extractors, and then apply the classifical OD models on the extracted latent representations.

Second, you could construct a matrix to hold the outlier scores for combination, and then use the https://github.com/yzhao062/pyod/blob/master/examples/comb_example.py for combination.

Actually, I created another package called combo a few years ago for model combination: https://github.com/yzhao062/combo/blob/master/examples/detector_comb_example.py although I am not sure whether deep learning models are compatible there.

pbosch commented 3 years ago

Sorry for the late answer, I didn't get any notification for some reason.

The second option you outlined is what I had in mind. But I think the first option might work better for my current problem. The data is fairly noisy and the prediction probabilities are all over the place.

Considering, for example, an AutoEncoder, what would be the easiest way to extract the latent representations? Would I need to go through the Keras object or is there another way?