transferwise / hisel

Feature selection tool based on Hilbert-Schmidt Independence Criterion
Apache License 2.0
2 stars 0 forks source link

Delta kernel for categorical variables - gram matrix and api #12

Closed claudio-tw closed 1 year ago

claudio-tw commented 1 year ago

Context

The first implementation of HSIC-based feature selection assumed continuous variables. Therefore, it used RBF kernels only. We want to extend the feature selection to cases when either features or target are discrete (or both). This requires the implementation of gram matrices with delta kernels.

This PR introduces the enum KernelType = {RBF, DELTA}, and generalises the computation of gram matrices. The API for the feature selection is also extended, giving the user the ability to specify the type of variables on which the selection should be performed. This extension is based on the enum FeatureType = {CONT, DISCR}.

Checklist