lazappi / clustree

Visualise Clusterings at Different Resolutions
https://lazappi.github.io/clustree/
GNU General Public License v3.0
215 stars 15 forks source link

clustree_overlay on UMAP from a seurat object #58

Closed naddsch closed 4 years ago

naddsch commented 4 years ago

Dear developers, I just came across your package and wanted to create an clustree_overlay with my existing seurat object. Unfortunately I can not figure out how to set the parameters correctly. I have tried:

clustree_overlay(seurat_object, red_dim = "umap", x_value = "UMAP_1", y_value = "UMAP_2") which results in Error: No data identified for x_value or y_value. Check that red_dim is set correctly.

I checked that there is a umap reduction and colnames(seurat_object@reductions$umap) tells me that the column names of my reduction are [1] "UMAP_1" "UMAP_2"

Can you tell me what I am doing wrong?

Also for the PCA, the overlay is not working for me. I have a pca in my seurat object$reductions, but the clustree_overlay is not working .

I am using Seurat version 3.1.5.

Thanks ans best, Laura

lazappi commented 4 years ago

Hi @naddsch

Thanks for giving {clustree} a go! You need to set x_value and y_value to the index of the column you want to use not the name. See this answer to another issue for details https://github.com/lazappi/clustree/issues/37#issuecomment-562838694.

naddsch commented 4 years ago

Thanks for the quick reply! Unfortunately also clustree_overlay(seurat_object, red_dim = "umap", x_value = "1", y_value = "2") as well as clustree_overlay(seurat_object, red_dim = "umap", x_value = 1, y_value = 2)

only result in the message Error: No data identified for x_value or y_value. Check that red_dim is set correctly.

Trying clustree_overlay(seurat_object, red_dim = "umap", x_value = "umap1", y_value = "umap2") results in a different error message: Error: Less than two column names matched the prefix: Protein_snn_res.

So might the last version be the correct one? I would really like to get this working, but it's a bit like trial and error on what the parameters might be and how it might be working... Can you tell me how to fix the error I get when using "umap1" and "umap2" as x and y values?

Best, Laura

lazappi commented 4 years ago

Yeah I know that this is can be confusing, totally not your fault. I need to document this more clearly (probably should have done it before the release last week...).

The last command is correct to fix the original issue. The new error is related to the prefix argument which selects the columns that contain clustering information. For a Seurat object this is set to paste0(assay, "_snn_res.") by default. Unless you set the assay argument the assay is whatever is returned by Seurat::DefaultAssay(seurat_object). From the output above it looks like your default assay is currently "Protein", but you probably used another assay (such as "RNA") for your clustering.

The solution is to either set the default assay of your object to match whatever you used for clustering or manually set prefix to match the names of your clustering columns.

lazappi commented 4 years ago

I'm going to close this now but please reopen/comment if you still have this problem.

Klammerfrosch commented 3 years ago

I'm going to close this now but please reopen/comment if you still have this problem.

Hi @lazappi ,

I just found your post here looking for an answer for the same problem as above. I think the problem is that one needs to generate the information for different resolutions prior to running clustree, as you mentioned here.

I am quite new to coding in general and I didn't make the connection that this data needs to be generated first. Maybe you could add this information to the vignette of your package?

Thank's for your work, it's been a great help!

Tim

lazappi commented 3 years ago

Hi @Klammerfrosch

Thanks for the comment! You are correct that clusters need to be generated in advance, {clustree} just helps with visualisation, it doesn't do any kind of clustering itself. I hope this is already clear in the vignette where all the example datasets already contain clusters and we show where they are saved in the different objects we support. Maybe this can be a bit clearer though.