talgalili / heatmaply

Interactive Heat Maps for R Using plotly
377 stars 73 forks source link

More general correlations without assuming linearity #13

Open hdvinod opened 8 years ago

hdvinod commented 8 years ago

There is a new R package called generalCorr with a simple function for example, gmcmtx0(mtcars)

produces 11 by 11 matrix of generalized correlation coefficients. Note that if r(Xi | Xj) exceeds r(Xj | Xi) then Xj is likely cause of Xi

It would be nice if one can view general correlation coefficients which are asymmetric and always larger than Pearson correlation coefficients.

talgalili commented 8 years ago

So there is no problem doing it if you are willing to not use the dendrograms:

install.packages("generalCorr")
install.packages("RColorBrewer")

# get the correlation:
library("generalCorr")
x <- gmcmtx0(mtcars)
# prepare some nice colors:
BrBG <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG"))
heatmaply(x, Rowv=FALSE, Colv= FALSE,
    colors = BrBG , limits = c(-1,1)) %>%   layout(margin = list(l = 40, b = 40))

image

However, if you want to also have the dendrograms, the problem is that it may not be possible to have the same ordering in the two groups since the values of the matrix are not symmetrical (so the dendrograms are different, and their different topologies may not allow the two to have the same order as we would like).

What do you think?

hdvinod commented 8 years ago

Can we give priority to causal side represented by numbers above the diagonal?

Sent from my iPhone

On Jun 1, 2016, at 5:23 AM, Tal Galili notifications@github.com wrote:

So there is no problem doing it if you are willing to not use the dendrograms:

install.packages("generalCorr") install.packages("RColorBrewer")

get the correlation:

library("generalCorr") x <- gmcmtx0(mtcars)

prepare some nice colors:

BrBG <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG")) heatmaply(x, Rowv=FALSE, Colv= FALSE, colors = BrBG , limits = c(-1,1)) %>% layout(margin = list(l = 40, b = 40))

However, if you want to also have the dendrograms, the problem is that it may not be possible to have the same ordering in the two groups since the values of the matrix are not symmetrical (so the dendrograms are different, and their different topologies may not allow the two to have the same order as we would like).

What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

hdvinod commented 8 years ago

Dear Tal

I wanted to try and focus on above diagonal correlation coefficients since they represent cause variable in a binary setting

I could not reproduce your plot on my computer Windows PC I must have done something wrong.with colors

heatmaply(x, Rowv=FALSE, Colv= FALSE,

  • colors = BrBG , limits = c(-1,1)) %>% layout(margin = list(l = 40, b = 40)) Error in as.character(col) : cannot coerce type 'closure' to vector of type 'character'

It does work after removing the colors=BrBG,

How can one control the ordering of variables in the plots? I would like to focus on difference in absolute values: | r_ij | - | r_ji | If this is positive then j-th column is the cause.

please help thanks

On Wed, Jun 1, 2016 at 5:23 AM, Tal Galili notifications@github.com wrote:

So there is no problem doing it if you are willing to not use the dendrograms:

install.packages("generalCorr") install.packages("RColorBrewer")

get the correlation:

library("generalCorr")x <- gmcmtx0(mtcars)# prepare some nice colors:BrBG <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG")) heatmaply(x, Rowv=FALSE, Colv= FALSE, colors = BrBG , limits = c(-1,1)) %>% layout(margin = list(l = 40, b = 40))

[image: image] https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.githubusercontent.com_assets_976006_15704458_7d353920-2D27f3-2D11e6-2D91be-2D9592e32cc5a1.png&d=CwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=YTSonS-8xEvhQWZl5imn_QGrJvnpeCeb00LJaWsiiVc&s=2r4nMmEv8JLp-SFDj7CXqPk7BRAy7-xRXAcGFULmA8g&e=

However, if you want to also have the dendrograms, the problem is that it may not be possible to have the same ordering in the two groups since the values of the matrix are not symmetrical (so the dendrograms are different, and their different topologies may not allow the two to have the same order as we would like).

What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_talgalili_heatmaply_issues_13-23issuecomment-2D222939350&d=CwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=YTSonS-8xEvhQWZl5imn_QGrJvnpeCeb00LJaWsiiVc&s=DbdxnRPM3qA1UIXWQ0h2i8sMHpKE7XCz84TN4yoqffg&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe_ABw1eVj-5FZjPHGiBFkAYtoQUtQb1LGQeHks5qHU-2DdgaJpZM4IrDD0&d=CwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=YTSonS-8xEvhQWZl5imn_QGrJvnpeCeb00LJaWsiiVc&s=WnzycAVQL39ZaQBM-3LRLQJF3TLT1LJbHthb01Vy6tw&e= .

Hrishikesh (Rick) D. Vinod Professor of Economics, Fordham University E-Mail: Vinod@fordham.edu Tel 718-817-4065, Secretary 718-817-4048, Fax 718-817-3518 Web page: http://www.fordham.edu/economics/vinod ResearchGate says my papers have been cited 2162 times in various research publications.

alanocallaghan commented 5 years ago

@hdvinod you can control ordering with Rowv and Colv (using vectors of integers, ie indexes). If you can provide some example code of how to use these correlations I'd be happy to consider incorporating in the package, otherwise I'd consider closing this.

hdvinod commented 5 years ago

As you can see from the following example generalized correlations are very easy to compute and report with one-line coding. Yes the ordering of row vectors is simple with the ordering of the data input matrix.

see mtcars example in the attached file.

On Mon, Aug 19, 2019 at 8:11 AM Alanocallaghan notifications@github.com wrote:

@hdvinod https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hdvinod&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=Mf_RyChjJdQp08PAXe4wYLNL7ZUAovZzeVxZDweqDVs&s=6cHeKvDKeZaBx9UK8rkj_72tbOn6w5vX66lddZPNCm0&e= you can control ordering with Rowv and Colv (using vectors of integers, ie indexes). If you can provide some example code of how to use these correlations I'd be happy to consider incorporating in the package, otherwise I'd consider closing this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_talgalili_heatmaply_issues_13-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAODK6LQJLZK3GAG523NVUTQFKEV5A5CNFSM4CFMGD2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4SV4NY-23issuecomment-2D522542647&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=Mf_RyChjJdQp08PAXe4wYLNL7ZUAovZzeVxZDweqDVs&s=Mn_4llkP51ebtXfjkNDK-cjcr3N92Re4r3XbUZW34Yw&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAODK6KIY4XCTE4XYMUGMNDQFKEV5ANCNFSM4CFMGD2A&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=Mf_RyChjJdQp08PAXe4wYLNL7ZUAovZzeVxZDweqDVs&s=gxH15K7tdlWCyXpIQD1jhb0FXGPFD889tuNZyfRQ3VE&e= .

--

Hrishikesh (Rick) D. Vinod Professor of Economics, Fordham University E-Mail: Vinod@fordham.edu Tel 718-817-4065, Secretary 718-817-4048, Fax 718-817-3518 Web page: http://www.fordham.edu/economics/vinod ResearchGate says my papers have been cited 2809 times in various research publications.

install.packages("generalCorr") library(generalCorr) options(np.messages=FALSE) mymtcars=mtcars[,1:5] rstar=gmcmtx0(mymtcars) rstar neword=c(1,4,2,3,5) rstar2=gmcmtx0(mymtcars[,neword]) rstar2

OUTPUT

rstar mpg cyl disp hp drat mpg 1.0000000 -0.8557900 -0.9508994 -0.9379374 0.6845546 cyl -0.9433125 1.0000000 0.9759183 0.9583212 -0.7512495 disp -0.8941676 0.9151419 1.0000000 0.9306311 -0.7697372 hp -0.8530474 0.8446589 0.8170031 1.0000000 -0.5542799 drat 0.6878267 -0.7015970 -0.9458881 -0.7434288 1.0000000 neword=c(1,4,2,3,5) rstar2=gmcmtx0(mymtcars[,neword]) rstar2 mpg hp cyl disp drat mpg 1.0000000 -0.9379374 -0.8557900 -0.9508994 0.6845546 hp -0.8530474 1.0000000 0.8446589 0.8170031 -0.5542799 cyl -0.9433125 0.9583212 1.0000000 0.9759183 -0.7512495 disp -0.8941676 0.9306311 0.9151419 1.0000000 -0.7697372 drat 0.6878267 -0.7434288 -0.7015970 -0.9458881 1.0000000

alanocallaghan commented 5 years ago

Thanks, but what order criteria would you apply with this? I'm slightly unclear what you mean about |rji| - |rij|.

If I'm understanding correctly, this can only be used for ordering the rows/columns (via Rowv or Colv), and not for computing dendrograms (due to asymmetry).

hdvinod commented 5 years ago

Dear Talgalili/Heatmaply: You are correct to say that asymmetry will limit application to dendograms. Yes |rji| - |rij| may not be useful here. It is one of three useful indicators of whether Xi causes Xj or vice versa A summary determination of the causal direction in generalCorr package is done by the command causeSummary(mtx) it pairs the first column of matrix mtx with all other columns and a decision rule reports which is likely to be the cause. One of these days we can talk about causality at length.

In our context of dendograms, We want to get away from the linearity assumption of correlation coefficients which can underestimate the dependence. Example: x=1:20; y=sin(x) simple correlation(x,y) is near zero even though x and y are perfectly dependent. gmcmtx0(cbind(x,y)) will have a better estimate of dependence!

If D=distance and C=correlation dendograms use D=1-C high positive correlation will have D=0 high negative correlation will have D=2 Same can be achieved by using gmcmtx0 function

Let sgn denote the sign of Pearson correlation coeff between Xi and Xj Now define C or revised correlation as C= sgn max(|rij|,|rji|) we want to keep the sign of rij Now D=1-C* is my proposal for more meaningful dendograms

I am using notation D and C from http://www.nonlinear.com/support/progenesis/comet/faq/v2.0/dendrogram.aspx

The vertical axis is labelled distance and refers to a distance measure between compounds or compound clusters.

The height of the node can be thought of as the distance value between the right and left sub-branch clusters.

The distance measure between two clusters is calculated as follows:

Please let me know if I can be of further assistance. We need a good example out there so folks can start using new dendograms. I hope this answers your e-mail

Best regards and congrats and cudos for your leadership in starting R-bloggers.

On Tue, Aug 20, 2019 at 5:38 AM Alanocallaghan notifications@github.com wrote:

Thanks, but what order criteria would you apply with this? I'm slightly unclear what you mean about |rji| - |rij|.

If I'm understanding correctly, this can only be used for ordering the rows/columns (via Rowv or Colv), and not for computing dendrograms (due to asymmetry).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_talgalili_heatmaply_issues_13-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAODK6PA246L6T4CI26QNVLQFO3QPA5CNFSM4CFMGD2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4VWE2Y-23issuecomment-2D522936939&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=5YMR4Tbkz0s3OzJQFUeB7_Gkh1XEwXghf6I6taa5xS4&s=uqF3lPqTDqgtDTTF9dULBvgPOL1JQVlgBkegNgDrPSk&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAODK6OKREUHFKYA5N7IY2DQFO3QPANCNFSM4CFMGD2A&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=5YMR4Tbkz0s3OzJQFUeB7_Gkh1XEwXghf6I6taa5xS4&s=mKios3V3sPA0mMSznfJ6AulYnP_NcVwXHi01YF5-yQQ&e= .

--

Hrishikesh (Rick) D. Vinod Professor of Economics, Fordham University E-Mail: Vinod@fordham.edu Tel 718-817-4065, Secretary 718-817-4048, Fax 718-817-3518 Web page: http://www.fordham.edu/economics/vinod ResearchGate says my papers have been cited 2809 times in various research publications.