Closed MehdiZouitine closed 9 months ago
Thank you for your interest in our work.
There are two empirical KPC (graph-based and RKHS based) described in our paper, which lead to two variable selection methods in the package (KFOCI and RKHS).
From a theoretical perspective, KFOCI assumes the X space is a metric space, while the RKHS approach assumes the X space is a kernel endowed space. This is the main difference between the two approaches.
In R^d, when both a natural metric (Euclidean distance) and some common kernels (e.g., Gaussian kernel) are available, I would recommend using KFOCI, because of the following reasons:
Best, Zhen
On Sun, Jun 18, 2023 at 5:47 AM MehdiZouitine @.***> wrote:
Hello, thank you for this great paper and the remarkable code! I have several questions (as I'm more a computer scientist than a true mathematician).
What are the advantages of KOFI RKHS compared to KFOCI?
Is KFOCI RKHS more sample efficient?
Have a nice day.
— Reply to this email directly, view it on GitHub https://github.com/zh2395/KPC/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLGUOT5XLTW3VK2ONTUPJ3XL3FCHANCNFSM6AAAAAAZKZADOM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your prompt feedback! It's clear now. I have another questions, which may be a bit basic.
As I read in your paper, KFOCI offers better feature selection than FOCI based on the CODEC measure. I was wondering if you know which approach converges faster, especially for a given number of samples (n samples) ?
From my understanding, kernels are designed to change the dimension, and in higher dimensions, dependencies are easier to measure, right?
Moreover, when I conduct experiments using Azakia's conditional estimator, I encounter considerable variance (even with thousand samples) . Does your KPC method demonstrate less variance?
Do KPC is suitable when the predictors are categorical data ?
Thank you once again!
As mentioned in our paper, CODEC is a special case of KPC, by using a specific kernel. The convergence rates should be similar. Suppose the (X, Z) space is of dimension d, the estimation errors are both of order n^{-1/d} --- the distance between a data point and its k-th nearest neighbor (suppose the k does not diverge with n) --- which could suffer from the curse of dimensionality.
Yes, you are correct. The benefits of our approach are mainly as follows:
Best, Zhen
On Mon, Jun 19, 2023 at 2:54 AM MehdiZouitine @.***> wrote:
Thank you for your prompt feedback! It's clear now. I have another question, which may be a bit basic. As I read in your paper, KFOCI offers better feature selection than FOCI based on the CODEC measure. I was wondering if you know which approach converges faster, especially for a given number of samples (n samples).
From my understanding, kernels are designed to change the dimension, and in higher dimensions, dependencies are easier to measure, right?
Thank you once again!
— Reply to this email directly, view it on GitHub https://github.com/zh2395/KPC/issues/1#issuecomment-1596610782, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLGUOU73CAZ467GXV3ZXTDXL7ZRNANCNFSM6AAAAAAZKZADOM . You are receiving this because you commented.Message ID: @.***>
Hello ! Thank you again !
Last question : did you try to compare $\hat{\rho^2}(Y, X \mid \emptyset)$ with the classic HSIC dependence criterion estimator ? :)
Have a nice day !
HSIC is 0 if and only if X and Y are independent, but it does not have the property on the other side --- it is not 1 when X and Y have functional relationship. Although the definition of HSIC also involves kernel mean embeddings (of the product distribution and joint distribution), it is a different quantity from KPC.
One benefit of HSIC is that its empirical version converges to the population version at a faster rate.
On Wed, Jun 21, 2023, 5:48 AM MehdiZouitine @.***> wrote:
Hello ! Thank you again !
Last question : did you try to compare $\hat{\rho^2}(Y, X_1 \mid \emptyset)$ with the classic HSIC dependence criterion ? :)
Have a nice day !
— Reply to this email directly, view it on GitHub https://github.com/zh2395/KPC/issues/1#issuecomment-1600533105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLGUOWXIBIR5PO3W4D6GJ3XMK7QTANCNFSM6AAAAAAZKZADOM . You are receiving this because you commented.Message ID: @.***>
Hello, thank you for this great paper and the remarkable code! I have several questions (as I'm more a computer scientist than a true mathematician).
What are the advantages of KFOCI RKHS compared to KFOCI? Is KFOCI RKHS more sample efficient ?
Have a nice day.