zh2395 / KPC

Kernel partial correlation coefficient
3 stars 2 forks source link

Question about KFOCI and KFOCI (RKHS) #1

Closed MehdiZouitine closed 9 months ago

MehdiZouitine commented 1 year ago

Hello, thank you for this great paper and the remarkable code! I have several questions (as I'm more a computer scientist than a true mathematician).

What are the advantages of KFOCI RKHS compared to KFOCI? Is KFOCI RKHS more sample efficient ?

Have a nice day.

zh2395 commented 1 year ago

Thank you for your interest in our work.

There are two empirical KPC (graph-based and RKHS based) described in our paper, which lead to two variable selection methods in the package (KFOCI and RKHS).

From a theoretical perspective, KFOCI assumes the X space is a metric space, while the RKHS approach assumes the X space is a kernel endowed space. This is the main difference between the two approaches.

In R^d, when both a natural metric (Euclidean distance) and some common kernels (e.g., Gaussian kernel) are available, I would recommend using KFOCI, because of the following reasons:

  1. KFOCI is computationally more efficient. The Euclidean k-NN graph can be computed in O(kn log(n)) time, where n is the number of observations.
  2. KFOCI has an automatic stopping criterion when performing variable selection (due to the fact that the empirical KPC based on geometric graphs can be negative).
  3. The empirical KPC based on geometric graphs assumes very mild conditions for many of its nice properties to hold (consistency, rate of convergence, etc). On the other hand, the RKHS based estimator requires stronger conditions which may be difficult to verify in practice.

Best, Zhen

On Sun, Jun 18, 2023 at 5:47 AM MehdiZouitine @.***> wrote:

Hello, thank you for this great paper and the remarkable code! I have several questions (as I'm more a computer scientist than a true mathematician).

What are the advantages of KOFI RKHS compared to KFOCI?

Is KFOCI RKHS more sample efficient?

Have a nice day.

— Reply to this email directly, view it on GitHub https://github.com/zh2395/KPC/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLGUOT5XLTW3VK2ONTUPJ3XL3FCHANCNFSM6AAAAAAZKZADOM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MehdiZouitine commented 1 year ago

Thank you for your prompt feedback! It's clear now. I have another questions, which may be a bit basic.

Thank you once again!

zh2395 commented 1 year ago

As mentioned in our paper, CODEC is a special case of KPC, by using a specific kernel. The convergence rates should be similar. Suppose the (X, Z) space is of dimension d, the estimation errors are both of order n^{-1/d} --- the distance between a data point and its k-th nearest neighbor (suppose the k does not diverge with n) --- which could suffer from the curse of dimensionality.

Yes, you are correct. The benefits of our approach are mainly as follows:

  1. It allows user-specified kernels, such as the popular Gaussian kernels in machine learning. In our simulation, the Gaussian kernel indeed achieves better performance.
  2. Our method can deal with multi-dimensional Y space.
  3. Our method allows k in the k-NN graph to increase beyond 1, which can have superior performance compared with k = 1. Such a phenomenon has also been witnessed in other k-NN based procedures, such as two-sample or multi-sample testings.

Best, Zhen

On Mon, Jun 19, 2023 at 2:54 AM MehdiZouitine @.***> wrote:

Thank you for your prompt feedback! It's clear now. I have another question, which may be a bit basic. As I read in your paper, KFOCI offers better feature selection than FOCI based on the CODEC measure. I was wondering if you know which approach converges faster, especially for a given number of samples (n samples).

From my understanding, kernels are designed to change the dimension, and in higher dimensions, dependencies are easier to measure, right?

Thank you once again!

— Reply to this email directly, view it on GitHub https://github.com/zh2395/KPC/issues/1#issuecomment-1596610782, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLGUOU73CAZ467GXV3ZXTDXL7ZRNANCNFSM6AAAAAAZKZADOM . You are receiving this because you commented.Message ID: @.***>

MehdiZouitine commented 1 year ago

Hello ! Thank you again !

Last question : did you try to compare $\hat{\rho^2}(Y, X \mid \emptyset)$ with the classic HSIC dependence criterion estimator ? :)

Have a nice day !

zh2395 commented 1 year ago

HSIC is 0 if and only if X and Y are independent, but it does not have the property on the other side --- it is not 1 when X and Y have functional relationship. Although the definition of HSIC also involves kernel mean embeddings (of the product distribution and joint distribution), it is a different quantity from KPC.

One benefit of HSIC is that its empirical version converges to the population version at a faster rate.

On Wed, Jun 21, 2023, 5:48 AM MehdiZouitine @.***> wrote:

Hello ! Thank you again !

Last question : did you try to compare $\hat{\rho^2}(Y, X_1 \mid \emptyset)$ with the classic HSIC dependence criterion ? :)

Have a nice day !

— Reply to this email directly, view it on GitHub https://github.com/zh2395/KPC/issues/1#issuecomment-1600533105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLGUOWXIBIR5PO3W4D6GJ3XMK7QTANCNFSM6AAAAAAZKZADOM . You are receiving this because you commented.Message ID: @.***>