Column cluster distance vs Row cluster distance

kevinblighe / E-MTAB-6141

Data from Lewis, Barnes, Blighe et al., Cell Rep. 2019 Aug 27; 28(9): 2455–2470.e5.

114 stars 39 forks source link

Hello Kevin,

Thank you for this tutorial that has been very useful (even 4 years later).

I have a question regarding the cluster distance metric you use, specifically regarding the difference between row and column distance.

You define the following:

clustering_distance_columns = function(x) as.dist(1 - cor(t(x))), clustering_method_columns = 'ward.D2', clustering_distance_rows = function(x) as.dist(1 - cor(t(x))), clustering_method_rows = 'ward.D2',

I understand that, for rows (genes), you use 1 - Pearson correlation of the transposed matrix.

But I see that you use the same formula for the column (sample) clustering. In the case of columns, shouldn't it be the 1 - Pearson correlation of the matrix itself? e.g:

clustering_distance_columns = function(x) as.dist(1 - cor(x))

I'm new to the field of RNAseq analysis, so forgive me is the question is naive, but I cannot visualise what it means to use the distance of the rows as metrics for the column clustering.

Thank you for your insight, and have a good day.

clustering_distance_rows It can be a pre-defined character which is in ("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski", "pearson", "spearman", "kendall"). It can also be a function. If the function has one argument, the input argument should be a matrix and the returned value should be a dist object. If the function has two arguments, the input arguments are two vectors and the function calculates distance between these two vectors. clustering_distance_columns Same setting as clustering_distance_rows.

kevinblighe / E-MTAB-6141

Column cluster distance vs Row cluster distance #4