Inconsistent number of clusters between chooseR pipeline and Seurat default parameters

rbpatt2019 / chooseR

An R framework for choosing clustering parameters in scRNA-seq analysis pipelines

GNU General Public License v3.0

32 stars 9 forks source link

Inconsistent number of clusters between chooseR pipeline and Seurat default parameters #8

Open L-Watcher opened 4 months ago

L-Watcher commented 4 months ago

Hi,

Thank you for developing this incredibly useful pipeline. It has been instrumental in my scRNA-seq analysis work. I have encountered an interesting issue while using the script provided: when using the same resolution, the number of clusters displayed in the silhouette file differs from the number obtained using the default parameters of Seurat::FindClusters().

Upon further investigation of the provided pipeline, I noticed that in your custom find_clusters(), you set the parameters graph.name = paste(reduction, assay, sep = ".") for both Seurat::FindNeighbors() and Seurat::FindClusters(). This causes the results stored in the Seurat.obj@graphs to be consistent with the default storage Seurat.obj@graphs$RNA_nn_res.{res}.

However, under Seurat default parameters, Seurat::FindNeighbors() saves both Seurat.obj@graphs$RNA_nn_res.{res} and Seurat.obj@graphs$RNA_snn_res.{res}, with Seurat::FindClusters() by default using Seurat.obj@graphs$RNA_snn_res.{res} for analysis. To maintain consistency in the analysis process, I modified the pipeline by removing the custom graph.name to retain Seurat default settings. I also changed the code from clusters <- obj[[glue::glue("{reduction}.{assay}_res.{res}")]] to clusters <- obj[[glue::glue("RNA_snn_res.{res}")]]. This way, the clustering results obtained from the pipeline are consistent with those obtained using Seurat's standard workflow.

I am not sure if you were aware of this issue during the pipeline's development or if there was a more specialized consideration for setting it this way. I would appreciate your insights on this matter.

Additional information that might be useful: Seurat V5.1.0.

Thank you in advance.

petemeng commented 3 months ago

Thank you for raising this issue! I'm also encountering a problem with the pipeline, possibly related to the Seurat version. When running the test code(examples/1_seurat_pipeline.R), I get the following error:

Clustering 0.8...
    Finding ground truth...
Error in slot(object = object, name = s) : 
  no slot of name "images" for this object of class "Seurat"

My Seurat version is 5.0.3:

packageVersion("Seurat")
‘5.0.3’

Could this error be due to a version incompatibility? Would you be willing to share the code for Seurat V5.1.0, as that might help resolve this issue? Thanks in advance for your help!

L-Watcher commented 3 months ago

Thank you for raising this issue! I'm also encountering a problem with the pipeline, possibly related to the Seurat version. When running the test code(examples/1_seurat_pipeline.R), I get the following error:
Clustering 0.8...
  Finding ground truth...
Error in slot(object = object, name = s) : 
  no slot of name "images" for this object of class "Seurat"
My Seurat version is 5.0.3:
packageVersion("Seurat")
‘5.0.3’
Could this error be due to a version incompatibility? Would you be willing to share the code for Seurat V5.1.0, as that might help resolve this issue? Thanks in advance for your help!

Hi! I've encountered a similar error before, and it was caused by the Seurat object being created with Seurat V4 or older version, while the Seurat in analysis environment was V5. You can try updating your Seurat object using object <- UpdateSeuratObject(object = object) to see if that resolves the error. However, I'm not sure if your Seurat object was created with an older version. If that's not the case, you might provide more details so others can help solve the issue.

Hope this helps!

yunbokai commented 3 months ago

Hi, I also met the same problem. I found there are differente cluster numbers between Seurat pipline and chooseR pipline with the same resolution. However, I found that the plot 'res_silhouette_umap.png' seems more suitable to the chooseR pipline. Do you think that the change of 'graph.name' will affect the number of best resolution? By the way, I still use SeuratV4 version. Thanks for your attention @L-Watcher

L-Watcher commented 3 months ago

Hi, I also met the same problem. I found there are differente cluster numbers between Seurat pipline and chooseR pipline with the same resolution. However, I found that the plot 'res_silhouette_umap.png' seems more suitable to the chooseR pipline. Do you think that the change of 'graph.name' will affect the number of best resolution? By the way, I still use SeuratV4 version. Thanks for your attention @L-Watcher

Hi, here is my understanding of chooseR pipeline. I'm not familiar with the algorithmic principles behind it, so these are just thoughts based on the analysis steps, and I welcome further discussion.

First, I think the differences between Seurat V4 and V5 won't affect the use of chooseR pipeline, as there isn't a notable difference in FindNeighbors() between two versions. Second, the core step of chooseR pipeline involves resampling to calculate the silhouette score for each cluster, thereby assessing the robustness of the clustering results at the given resolution.

From my understanding, regardless of whether RNA_snn or RNA_nn is used for FindClusters(), chooseR identifies the optimal resolution corresponding to the graph used (RNA_snn or RNA_nn). Therefore, to maintain the consistency with Seurat pipeline, I made the modifications described above to chooseR pipeline. This indeed produced optimal resolution when clustering with RNA_snn, and the result was quite good. I believe that maintaining the consistency with the analysis workflow is crucial, as it ensures the interpretability of the results.

Thanks for your attention!

yunbokai commented 3 months ago

Hi, I also met the same problem. I found there are differente cluster numbers between Seurat pipline and chooseR pipline with the same resolution. However, I found that the plot 'res_silhouette_umap.png' seems more suitable to the chooseR pipline. Do you think that the change of 'graph.name' will affect the number of best resolution? By the way, I still use SeuratV4 version. Thanks for your attention @L-Watcher

Hi, here is my understanding of chooseR pipeline. I'm not familiar with the algorithmic principles behind it, so these are just thoughts based on the analysis steps, and I welcome further discussion.

First, I think the differences between Seurat V4 and V5 won't affect the use of chooseR pipeline, as there isn't a notable difference in FindNeighbors() between two versions. Second, the core step of chooseR pipeline involves resampling to calculate the silhouette score for each cluster, thereby assessing the robustness of the clustering results at the given resolution.

From my understanding, regardless of whether RNA_snn or RNA_nn is used for FindClusters(), chooseR identifies the optimal resolution corresponding to the graph used (RNA_snn or RNA_nn). Therefore, to maintain the consistency with Seurat pipeline, I made the modifications described above to chooseR pipeline. This indeed produced optimal resolution when clustering with RNA_snn, and the result was quite good. I believe that maintaining the consistency with the analysis workflow is crucial, as it ensures the interpretability of the results.

Thanks for your attention!

I have tried your pipline mentioned above and it works. The best resolution chosen by chooseR is same with my orignal resolution chosen by biological knowledge. Thanks for your selfless assistance again. I also tried to modify the pipline to use the FindSubCluster function. However, I found it was difficult to make it. Have you ever tried to use FindSubCluster function?

L-Watcher commented 3 months ago

Hi, I also met the same problem. I found there are differente cluster numbers between Seurat pipline and chooseR pipline with the same resolution. However, I found that the plot 'res_silhouette_umap.png' seems more suitable to the chooseR pipline. Do you think that the change of 'graph.name' will affect the number of best resolution? By the way, I still use SeuratV4 version. Thanks for your attention @L-Watcher

Hi, here is my understanding of chooseR pipeline. I'm not familiar with the algorithmic principles behind it, so these are just thoughts based on the analysis steps, and I welcome further discussion. First, I think the differences between Seurat V4 and V5 won't affect the use of chooseR pipeline, as there isn't a notable difference in FindNeighbors() between two versions. Second, the core step of chooseR pipeline involves resampling to calculate the silhouette score for each cluster, thereby assessing the robustness of the clustering results at the given resolution. From my understanding, regardless of whether RNA_snn or RNA_nn is used for FindClusters(), chooseR identifies the optimal resolution corresponding to the graph used (RNA_snn or RNA_nn). Therefore, to maintain the consistency with Seurat pipeline, I made the modifications described above to chooseR pipeline. This indeed produced optimal resolution when clustering with RNA_snn, and the result was quite good. I believe that maintaining the consistency with the analysis workflow is crucial, as it ensures the interpretability of the results. Thanks for your attention!

I have tried your pipline mentioned above and it works. The best resolution chosen by chooseR is same with my orignal resolution chosen by biological knowledge. Thanks for your selfless assistance again. I also tried to modify the pipline to use the FindSubCluster function. However, I found it was difficult to make it. Have you ever tried to use FindSubCluster function?

Sorry, I haven't used the FindSubCluster() before. You might want to check the source code of the function. Personally, I typically use subset() to extract the cells that need further subclustering and then run a new round of Seurat pipeline. This approach works well and doesn't cause issues when running chooseR pipeline.