riken-aip / pyHSICLasso

Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data
MIT License
171 stars 42 forks source link

Clarification on the difference between an input vs. output kernel #36

Closed teyden closed 4 years ago

teyden commented 4 years ago

https://github.com/riken-aip/pyHSICLasso/blob/0617219eb3963e1b26221ddbe49ab1b9fb403f3e/pyHSICLasso/hsic_lasso.py#L23

Hi, I'm wondering if some clarification could be provided on this difference.

In addition, is it necessary that the y_kernel and x_kernel are the same? My intuition is that they should be. But what I can see from the code, that is not enforced. What is the rationale that the y and X could be projected to a different space?

myamada0321 commented 4 years ago

Hi,

In addition, is it necessary that the y_kernel and x_kernel are the same? My intuition is that they should be. But what I can see from the code, that is not enforced. What is the rationale that the y and X could be projected to a different space?

We can use different kernels for input and output. For classification, in general, we need to use the delta kernel (otherwise, the performance can be poor for classification).

For instance, if we set the label of y takes 1,2,3 (i.e., 3 class classification). For the Gaussian kernel, we compute the kernel by exp(-||y_i - y_j||^2/(2s^2)), while we compute delta kernel as 1 if y_i and y_j are the same 0 otherwise. For Gaussian kernel case, the similarity between class 1 and class 3 is lower than the one with classes 1 and 2. This is not a good property for classification. (If we know class 1 and class 2 should be closer than class1 and class3, it may be a good idea to use the Gaussian kernel)