iamDecode / sklearn-pmml-model

A library to parse and convert PMML models into Scikit-learn estimators.
BSD 2-Clause "Simplified" License
76 stars 15 forks source link

Introduce k-nearest neighbors estimators #38

Closed iamDecode closed 2 years ago

iamDecode commented 2 years ago

This PR introduces support for k-nearest neighbors classification and regression, and a number of different distance metrics. Some distance metrics were only supported by either scikit-learn (e.g., mahalanobis) or PMML (e.g., squaredEuclidean), and hence were left out for now. Since the implementation by scikit-learn allows for callable metric functions, support could be added in future.

Since distance metrics generally only work on either numerical or categorical, but not mixed column types, "categorical support" proved a bit challenging. Some work is being done to address this, but until then I chose to leave out categorical support. In addition, I think one-hot encoding is a bad approach and will attribute more weight to categorical columns.