stefftaelman / Xicor.jl

An implementation of the Xi Correlation coefficient and hypothesis test as originally described by Chatterjee (2021).
MIT License
11 stars 2 forks source link

Xicor.jl

An implementation of the Xi Correlation coefficient and hypothesis test as originally described by Chatterjee (2021). The implementation is based on the XICOR CRAN package by Chatterjee and Holmes.

Installation

Xicor.jl is available from the Julia registry, you can install it locally from your Julia REPL by typing:

] add Xicor

Running

Xicor.jl allows to eaily calculate the Xi correlation coefficient between two vectors $X$ and $Y$.

Calculating Xi correlation

xvec = rand(100)
yvec = rand(100)
xi_coeficient = ξ(xvec, yvec) # or xicor(xvec, yvec)

Dependence testing

Aside retrieving a correlation coefficient, Xicor.jl also allows to test for dependence between two vectors $X$ and $Y$. This is possible by using the test_dependence fucntion. The function will then return a tuple of the estimated p-value and standard deviation.

test_dependence(xi_coefficient, xvec, yvec)

Depending on the available number of observations either a permutation test (for <= 20 observations) or an asymptotic test will be performed. The asymptotic approximation test is based on the asymptotic normal distribution of the Xi correlation coefficient. The Monte Carlo permutation test is based on the permutation distribution of the Xi correlation coefficient. The Monte Carlo permutation test also uses a keyword argument nperm to specify the number of permutations to be used. The default value is 1000.

test_dependence(xi_coefficient, xvec, yvec; nperm=1000)

Application

Due to the asymmetry of the Xi correlation coefficient, it finds an interesting application in the inference of directionality of cause-effect relationships. To test its strengths in this endeavor, we apply the coefficient on the Tuebingen cause-effect pairs database described in Mooij et al. (2017). The data can be downloaded from https://webdav.tuebingen.mpg.de/cause-effect/pairs.zip, and consists of a number of txt files, each containing a list of cause-effect pairs and description files containing information about the data sources and the ground truth direction of the causal relationships.

We test it here on the first pair file, which contains data from the German Weather Service on the relationship between temperature and altitude. The data is loaded in and the Xi correlation coefficient is calculated for both directions of the causal relationship. The direction with the higher Xi correlation coefficient is then chosen as the predicted direction of the causal relationship.

    # load in the data
    pair_values = hcat(
        [parse.(
            Float64, 
            [q for q in split(i, r"\s+") if q != ""]
            ) for i in split(chomp(readchomp("./data/pair0001.txt")), "\n")
        ]...
        )
    altitudes = pair_values[1, :]
    temperatures = pair_values[2, :]

    # calculate the Xi correlation coefficient for both directions
    using Xicor
    xy_xi = ξ(altitudes, temperatures)
    yx_xi = ξ(temperatures, altitudes)
    predicted_direction = xy_xi > yx_xi ? "Altitude causes temperature" : "Temperature causes altitude"
    println("Predicted direction: $predicted_direction")

And it correctly predicts the direction of the causal relationship in this case.

For a deeper exploration of directional inference using Xi correlation, check out the causal direction prediction example.