Open PierreLaplante opened 2 months ago
Interesting question. ComplexHeatmap says this about these arguments:
clustering_distance_rows
It can be a pre-defined character which is in ("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski", "pearson", "spearman", "kendall"). It can also be a function. If the function has one argument, the input argument should be a matrix and the returned value should be a dist object. If the function has two arguments, the input arguments are two vectors and the function calculates distance between these two vectors.
clustering_distance_columns
Same setting as clustering_distance_rows.
If you run with clustering_distance_columns = function(x) as.dist(1 - cor(x))
, does it not throw an error?
Hello Kevin,
Thank you for this tutorial that has been very useful (even 4 years later).
I have a question regarding the cluster distance metric you use, specifically regarding the difference between row and column distance.
You define the following:
clustering_distance_columns = function(x) as.dist(1 - cor(t(x))), clustering_method_columns = 'ward.D2', clustering_distance_rows = function(x) as.dist(1 - cor(t(x))), clustering_method_rows = 'ward.D2',
I understand that, for rows (genes), you use 1 - Pearson correlation of the transposed matrix.
But I see that you use the same formula for the column (sample) clustering. In the case of columns, shouldn't it be the 1 - Pearson correlation of the matrix itself? e.g:
clustering_distance_columns = function(x) as.dist(1 - cor(x))
I'm new to the field of RNAseq analysis, so forgive me is the question is naive, but I cannot visualise what it means to use the distance of the rows as metrics for the column clustering.
Thank you for your insight, and have a good day.