yzhao062 / pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
http://pyod.readthedocs.io
BSD 2-Clause "Simplified" License
8.59k stars 1.37k forks source link

error in LSCP(KNN) model, while running the model with each KNN works #132

Open flycloudking opened 5 years ago

flycloudking commented 5 years ago

I run the following code on a data frame country_df of 25 samples, 18 columns:

knn_list = [KNN(n_neighbors=2), KNN(n_neighbors=4)]

print(knn_list)

clf = LSCP(knn_list) clf.fit(country_df)

and get error: ValueError Traceback (most recent call last)

in 9 print(country_df.shape) 10 ---> 11 clf.fit(country_df) 12 #mm_score = (clf.decision_scores_ - clf.decision_scores_.min()) / (clf.decision_scores_.max() - clf.decision_scores_.min()) 13 /mnt/share/TAD_DS_share/flycloud/lib/python3.6/site-packages/pyod/models/lscp.py in fit(self, X, y) 169 170 # set decision scores and threshold --> 171 self.decision_scores_ = self._get_decision_scores(X) 172 self._process_decision_scores() 173 /mnt/share/TAD_DS_share/flycloud/lib/python3.6/site-packages/pyod/models/lscp.py in _get_decision_scores(self, X) 233 # standardize test data and get local region for each test instance 234 X_test_norm = X --> 235 test_local_regions = self._get_local_region(X_test_norm) 236 237 # calculate test scores /mnt/share/TAD_DS_share/flycloud/lib/python3.6/site-packages/pyod/models/lscp.py in _get_local_region(self, X_test_norm) 315 # Find neighbors of each test instance 316 _, ind_arr = tree.query(X_test_norm[:, features], --> 317 k=self.local_region_size) 318 319 # add neighbors to local region list sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.BinaryTree.query() ValueError: k must be less than or equal to the number of training points However, when running each KNN method separately as below, they have no issues with different k neighbors clf = KNN(algorithm='auto', contamination=0.1, leaf_size=30, method='largest', metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0) clf.fit(country_df) the input data is attached here as tab delim file. Thanks [dummy.txt](https://github.com/yzhao062/pyod/files/3576606/dummy.txt)
ljluestc commented 2 months ago
from pyod.models.lscp import LSCP
from sklearn.neighbors import KNeighborsClassifier as KNN
import pandas as pd

# Load your data
# Example: country_df = pd.read_csv('dummy.txt', delimiter='\t')

# Define KNN models with valid `n_neighbors` values
knn_list = [KNN(n_neighbors=2), KNN(n_neighbors=4)]  # Ensure n_neighbors is less than or equal to number of samples

# Initialize LSCP model with the list of KNN models
clf = LSCP(knn_list)

# Fit LSCP model
clf.fit(country_df)