lazappi / clustree

Visualise Clusterings at Different Resolutions
https://lazappi.github.io/clustree/
GNU General Public License v3.0
218 stars 15 forks source link

Weird result when using clustree on a subset seurat object #63

Closed PernilleYR closed 2 years ago

PernilleYR commented 4 years ago

Hello!

I've been trying to use clustree on my seurat object, and it works nicely. However as some cell types are not really interesting in my analysis I did a second seurat object using the subset() function of seurat to select only the cluster of interest. When I try to use clustree on this object it runs but the output is very weird as the cells belonging to clusters number 10 of most resolutions appear at the top of the plot all on top of each other. I tried to correct that by using as.numeric() on the clustering results or by using factors and making sure that the levels are in numerical order but nothing of this work.

Finally I just "crop" my clustree of the full seurat object, but I just wanted to point out this issue n case you would like to have a look at it.

Thanks for this very nice tool!

Best, Pernille

lazappi commented 4 years ago

Hi @PernilleYR

Thanks for giving clustree a go and reporting the issue 🎉! Could you please upload an example of what the image looks like and the code you used to make it? That would help with working out what is happening.

PernilleYR commented 4 years ago
Clustree_SubsetCluster1-2-3-7ofSeuratObject Clustree_allSeuratObject

Of course, here you have the result for the netir eseurat object or the result when I subset cluster 1,2,3,7

Thank you!

Best,

Pernille

lazappi commented 4 years ago

Thanks! That definitely does look weird. I think what is happening is that when you subset the object some of the clusters at different resolutions are completely removed. Can you please post the code for these examples so I can think more about what is going on?

lazappi commented 4 years ago

Hi @PernilleYR

Were you able to solve this issue or are there things I can still help with?

ChristopherBarrington commented 3 years ago

I have come across a similar problem. For my Seurat objects, I use subset, rescale, re-PCA and re-cluster with the same resolutions but for one of my objects (and only one) the result is similar to the above, worse in fact! (The most recent effort is attached)

When I have show_axes=TRUE, which I would prefer, an error is produced:

Error: `breaks` and `labels` must have the same length

But setting show_axes=FALSE will prevent that error but produce the disfigured graph.

I found that if I removed one specific resolution variable from the seurat@meta.data slot that no error was produced and the graph was laid out as expected.

I have run the same code on different objects and the clustree function runs without problem on the other, so I am very confused as to what it could be.

I have Seurat v4.0.3 installed and pulled the current version of clustree from this repo.

image

lazappi commented 3 years ago

Hi @ChristopherBarrington

Can you please post the code you were using for submitting, clustering etc.? If you are able to share the object that has this problem (or a smaller version) that would also be helpful. Even better would be if you could reproduce this effect with a small public test dataset (the Seurat PBMC3K data would be perfect).

My guess is something is getting messed up in the indexing between resolutions but not really sure what. It would be interesting to see what metadata columns you have and specifically what was in the column you removed to fix the plot.

Thanks

ChristopherBarrington commented 3 years ago

Thanks for looking into this. The data is unpublished but I can share the meta.data object that produces the same result. I'm not sure why I get a different layout to my previous post this morning, but there is still a problem - more like the original user's plot.

I had to add the .zip to the rds file to attach it here but it is just a normal saveRDS-written file. Hopefully that will be enough that you get the same error as on my system.

meta.data %>% clustree(prefix='RNA_snn_res.') # gives weird result
meta.data %>% select(-RNA_snn_res.2.6) %>% clustree(prefix='RNA_snn_res.') # as expected

There are many resolutions in the table but this has not been a problem with the other objects I have processed with the same code. The function calls are generic Seurat. I am a bit uneasy to put the unpublished data online but I get the same results with the following pipeline:

seurat %>%
  DietSeurat() %>%
  ScaleData(vars.to.regress=c('CC.Difference', 'male.score')) %>%
  RunPCA(npcs=30) %>% 
  FindNeighbors(dims=1:30) %>% 
  FindClusters(dims=1:30) %>%
  clustree()

I have used your program with many objects (and the people I work with find them really useful, thanks) but this is the only time I have come across this problem and cannot see what is causing this object to give this output.

clustree_odd_result.rds.zip

lazappi commented 3 years ago

Ok, I have had a quick look at this. I'm still not entirely sure what is going on but I think there's a reasonable workaround. Thanks for the providing the dataset, that was super helpful. I should have been a bit clearer that I only needed the metadata part without any identifying information.

First let's make a small version of your dataset with 200 sample and five resolutions around the one that seems to be problematic:

meta <- readRDS("clustree_odd_result.rds")
mini <- mini <- meta[sample(nrow(meta), 200), c("RNA_snn_res.2.2", "RNA_snn_res.2.4", "RNA_snn_res.2.6", "RNA_snn_res.2.8", "RNA_snn_res.3")]

I can confirm that there is an issue, here is what the default plot looks like (interestingly if you run this multiple times sometimes you get different results):

clustree(mini, prefix = "RNA_snn_res.")

image

I tried changing the prop_filter and use_core parameters which didn't make a difference but using the "sugiyama" layout did:

clustree(mini, prefix = "RNA_snn_res.", layout = "sugiyama")

image

From this there doesn't seem to be anything obviously wrong with the graph structure (which is what I initially thought) but instead it seems like the tree layout algorithm which comes from the {igraph} package seems to fail. This is a heavily used package so it seems a bit odd for this to happen. Not sure if there is something in {clustree} which is the problem or it is something about the clusterings themselves. I think this would take a bit of detective work to track down so I suggest using layout = "sugiyama" when this is an issue. If you want to look into it further and see if you find something that would be awesome.

ChristopherBarrington commented 3 years ago

I was confused too about why it happened only with this dataset and only that resolution. I hadnt thought to use a different layout though, I just took that resolution out and carried on.

The error about axis breaks and labels is a ggplot error though so I wondered if it was to do with how the labels were parsed into axis positions. I tried looking in the data slot of the plot but couldnt find anything.

Thank you for looking into the problem though. If I have any inspiration Ill be sure to let you know

lazappi commented 2 years ago

I'm going to close this again because I think it is an upstream issue with {igraph} but feel free to comment if there is more to add.