taiyun / corrplot

A visual exploratory tool on correlation matrix
https://github.com/taiyun/corrplot
Other
316 stars 86 forks source link

Support for combining rectangular (non-square) correlogram with significance test #173

Closed janstrauss1 closed 3 years ago

janstrauss1 commented 3 years ago

Hi there,

I would like to visualize correlations between a group of colums (variables/features) from a data.frame x with another group of columns (variables/features) from a data.frame y. Both, data.frames relate to the same observations (samples).

Yet, since the data frames x and y have a different number of variables, I am plotting a rectangular correlogram. Not very surprisingly, I'm getting some obvious warnings (e.g. that my p.mat and corr may be not paired) when trying to combine my rectangular correlogram with the cor.mtest significance test results that are quadratic that can also lead to some strange plotting results of corrplot (see below).

However, as cor.mtest only takes a single matrix and does not allow to specify the pairs of input features (i.e. columns/variables/ features) to test, I'm currently not seeing an easy fix.

I guess the difficulty lies within the fact that in contrast to cor(), the cor.test function does not work on two data frames or matrices but maybe apply or lapply might work?!

Would it be possible to add the feature that one can specify the pairs of input features to test in cor.mtest?

Please see the reprex below that illustrates my problem:

library(corrplot)
#> corrplot 0.88 loaded
data(mtcars)

x <- mtcars[1:4]
y <- mtcars[5:11]

M <- cor(x, y)
# corrplot(corr = M)

## Combining correlogram with the significance test
res <- cor.mtest(mat = M, conf.level = .95)

corrplot(M, p.mat = res$p, sig.level = .05)
#> Warning in rownames(p.mat) == rownames(corr): longer object length is not a
#> multiple of shorter object length
#> Warning in corrplot(M, p.mat = res$p, sig.level = 0.05): p.mat and corr may be
#> not paired, their rownames and colnames are not totally same!

# corrplot(M, p.mat = res$p, insig = "label_sig", pch.col = "white", pch = "p<.05", pch.cex = 1)

Created on 2021-05-28 by the reprex package (v2.0.0)

Many thanks in advance for any help!

taiyun commented 3 years ago

It is WRONG to do correlation test like:

res <- cor.mtest(mat = M, conf.level = .95)

The correct usage is:

## Combining correlogram with the significance test
res <- cor.mtest(mat = mtcars, conf.level = .95)

corrplot(M, p.mat = res$p[1:4, 5:11], sig.level = .05)
janstrauss1 commented 3 years ago

Many thanks for your feedback @taiyun! It helped me to solve my issue.

Sorry, that my reprex was a bit sloppy but you are absolutely correct that one shouldn't run cor.mtest on the correlation matrix provided my cor() but on the original data.

Yet, the crucial thing to solve my issue was rather to subset the matrix of p-values correctly by p.mat = res$p[1:4, 5:11] when calling corrplot().

I've included a reprex below that better reflects my situation and might help others when working with rectangular (non-sqare) correlograms:

library(corrplot)
#> corrplot 0.88 loaded

## make two example matrices
data1 <- matrix(runif(252), ncol=7, 
                dimnames = list(sprintf("S%d", seq(1:36)),
                                sprintf("data1.F%d", seq(1:7))
                                )
                )

data2 <- matrix(runif(468), ncol=13,
                dimnames = list(sprintf("S%d", seq(1:36)),
                                sprintf("data2.F%d", seq(1:13))
                                )
                )

## compute correlation between matrices
corr.matrix <- cor(data1, data2, method = "pearson")

## Combining correlogram with the significance test
res <- cor.mtest(mat = cbind(data1, data2), conf.level = .95)

corrplot(corr.matrix, p.mat = res$p[1:7, 8:20], sig.level = .05)

corrplot(corr.matrix, p.mat = res$p[1:7, 8:20], insig = "label_sig", pch.col = "white", pch = "p<.05", pch.cex = 1)

Created on 2021-05-31 by the reprex package (v2.0.0)