igrabski / sc-SHC

Significance analysis for clustering single-cell RNA-sequencing data
92 stars 10 forks source link

Visualising data + accessing p-values #4

Closed ShakedLab closed 1 year ago

ShakedLab commented 1 year ago

After running:

clusters <- scSHC(counts)

How do we actually visualise the tree (as shown in Fig 1 of the paper)? I was under the impression that scSHC returns a data.tree object but:

library(data.tree)
plot(clusters)

Returns the error:

Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' is a list, but does not have components 'x' and 'y'

And plot(clusters[[2]]) returns a very basic hierarchy but does not show the p-values or details.

And where are the p-values stored in the object? It's somewhat difficult to navigate the structure of the object.

igrabski commented 1 year ago

Currently, we do not implement a way to visualize the tree -- the tree shown in Figure 1 is actually a cluster stability analysis tree from modified code from the clustree package, and just visualizes how clusters at different parameter values relate to one another, rather than the hierarchy of clusters produced in our approach. The best way to see the relationships across clusters and the p-values would be to just print clusters[[2]] (rather than plotting). However, in the future I am interested in implementing a visualization so stay tuned!

ShakedLab commented 1 year ago

Thanks for the reply.

print(clusters[[2]])

This seems to only provide the proportions and not the p-values:

Screenshot from 2023-07-13 12-28-06

What am I missing?

igrabski commented 1 year ago

The numbers are actually the adjusted p-values (which can be interpreted as the highest alpha, i.e. family-wise error rate, at which we would have rejected the null hypothesis). There are a lot of small p-values here that got rounded to 0, but you can see at many of the nodes where we split into clusters, the p-value is near-0, and in this case at most/all of the clusters where we chose not to split further, the adjusted p-value is 1. It looks like there are at least two nodes here where we had an adjusted p-value that wasn't 0 or 1 (nodes 6 and 8).