romanhaa / Cerebro

Visualization of scRNA-seq data.
MIT License
93 stars 19 forks source link

Ability to create custom cell clusters #18

Open gorgitko opened 4 years ago

gorgitko commented 4 years ago

Hello!

First words: impressive work! I really like Cerebro :slightly_smiling_face:


Also, I like some features of Loupe Cell Browser. Namely it is the ability to:

  1. Using mouse to select cells of interest and create a custom cluster.

Mouse selection of cells

  1. Using gene expression to create a custom cluster matching expression criteria (e.g. log2(counts) > 1).

Custom clusters based on gene expression

  1. Same as 2., but more advanced filters can be specified.

More advanced filter based on gene expression

  1. Using custom clusters to do the differential gene expression analysis. This is possible on both global (my cluster vs. all other cells not in my cluster) and local (my cluster 1 vs. my cluster 2) scale.

Differential expression analysis of custom clusters

  1. Not really a thing Loupe can do, but would it be possible to calculate an enrichment of custom gene set (possibly using custom clusters)?

Would it be possible to implement some of these features? We think analysis of scRNA-seq data is, in general, composed of a lot of manual work, and so we want to provide biologists a tool, which will be able not to only visualize data, but also to do some useful analyses.

Consider the case, when biologist will identify some interesting cell cluster and want to see its differential expression relative to all other cells, and also enriched pathways. I can imagine biologist could give me information I can use to create the cell cluster of interest, but then I have to manually run DEA and GSEA, and share the results. That's very time-consuming and we think such analysis can be easily done in a proper tool (Cerebro :slightly_smiling_face:).

Thanks in advance! I think I could contribute to Cerebro, but I am not a Shiny expert :frowning_face:

romanhaa commented 4 years ago

Hi @gorgitko and thank you for the feedback. I'm glad you find Cerebro useful. Regarding your questions.

Selecting cells in the dimensional reduction and grouping them as a custom cluster is a feature request I have received a few times before. I'm sure it's possible somehow, but I haven't been able to implement it properly yet. Another difficulty is that even if you can define new cell groups, Cerebro at the moment won't allow performing differential gene expression analysis for this group so the functional readout would, in any case, be very limited. One could, however, add this cell grouping when checking gene (set) expression. This then applies also to points 2) and 3).

Performing differential expression analysis inside Cerebro is currently not possible because it requires significant computation. I'm sure the Loupe Cell Browser has a very efficient implementation of it. I could only imagine running this kind of analysis as a background job to keep the GUI responsive, but I'm not sure whether this is possible with Shiny.

Enrichment scores for custom gene sets: I'm sure you've noticed that you can check the expression of gene sets. Using the "new" performGeneSetEnrichmentAnalysis() function of cerebroApp you can generate a GMT file with your gene sets of interest and then run a GSVA-based enrichment analysis for every sample and cluster on them. But you probably had in mind to do this within Cerebro. Again, for performance reasons, I'm a bit skeptical.

Maybe you have some ideas to improve performance? My priority is always to make sure Cerebro can also be run on older machines because not everybody has a performant computer. Yet, I definitely understand your final comment and agree that providing these functionalities would be of great help to interpret the data.

gorgitko commented 4 years ago

Hi, thanks for the prompt reply.

Selecting cells in the dimensional reduction and grouping them as a custom cluster is a feature request I have received a few times before. I'm sure it's possible somehow, but I haven't been able to implement it properly yet.

This seems to be very easily implementable, see this example.

One could, however, add this cell grouping when checking gene (set) expression. This then applies also to points 2) and 3).

This would be certainly possible, without any increased performance claims.

Performing differential expression analysis inside Cerebro is currently not possible because it requires significant computation. I'm sure the Loupe Cell Browser has a very efficient implementation of it.

The algorithms used in Loupe are described here. But even in Loupe it takes several minutes to compute DEA.

I could only imagine running this kind of analysis as a background job to keep the GUI responsive, but I'm not sure whether this is possible with Shiny.

This is possible since Shiny allows to use asynchronous programming implemented in the promises package. From examples it seems it is not hard to use.

I'm sure you've noticed that you can check the expression of gene sets. Using the "new" performGeneSetEnrichmentAnalysis() function of cerebroApp you can generate a GMT file with your gene sets of interest and then run a GSVA-based enrichment analysis for every sample and cluster on them.

Hmm, I haven't noticed that :hushed: Is it in the development version? I am using cerebroApp version 1.2.0. But good to know, at least I can create custom gene sets on request from biologists.

But you probably had in mind to do this within Cerebro. Again, for performance reasons, I'm a bit skeptical.

I have no experience with GSEA in the field of single-cell data, but in case of RNA-seq data, used methods are wicked fast. For example the amazing clusterProfiler is using fgsea, which is really fast.

Maybe you have some ideas to improve performance? My priority is always to make sure Cerebro can also be run on older machines because not everybody has a performant computer.

I haven't dig inside Cerebro code yet, but maybe I can find some performance improvements. I have noticed some sections (e.g. Gene expression) take some time to load. Using the asynchronous programming mentioned above, it would be possible to silently load sections while user is looking at some other ones.

Also Cerebro can run on a performant server, as we probably will do at our institute.

Yet, I definitely understand your final comment and agree that providing these functionalities would be of great help to interpret the data.

I agree that demanding computations could disrupt the user experience of Cerebro. On the other hand, asynchronous programming could help with this.

I think that for now, just the ability to select cells (by mouse or filters), create custom clusters and export this data would be a great enhacement of Cerebro. Then, using these clusters, I can run more advanced analyses and report the results to biologists.


P.S. This is not related to this issue (I can make a new one), but in our pipeline we are performing several clustering algorithms. It would be great to be able to switch between them. During the preparation of data for Cerebro, instead of column_cluster = "cluster_col", vector of column names would be passed and computations will be done for all of them. In the frontend, user will be then able to switch between these clustering results.