maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
105 stars 42 forks source link

displaying subsets of the data #27

Closed slowkow closed 5 years ago

slowkow commented 6 years ago

When working with large single-cell datasets, it is often useful to look at two or more levels. Several papers have done analysis at two or more levels:

When browsing the data at Level 2, it's necessary to hide all of the cells except the cells in the chosen subset. Then the user can browse just the different clusters of T cells, for example, without worrying about all the other cell types in an experiment.

It would be nice to support this type of subset-level analysis in cellBrowser.

Thinking about how this might be implemented...

I think you might already be most of the way there, since you support multiple files with cell coordinates:

https://github.com/maximilianh/cellBrowser/blob/03ca5f32610c571d6ada8b0db7062d5e77287b5f/sampleData/sample1/cellbrowser.conf#L41-L45

What happens if one of the coordinate files only lists coordinates for a subset of cells instead of all cells?

(I haven't tried, so I apologize in advance if this is already supported and I'm not aware.) It would be cool if cellBrowser automatically figures out that it should hide the cells that are not listed in a given coordinate file.

I wonder if you have thoughts about how to organize and navigate these types of subset-level results?

maximilianh commented 6 years ago

I made a change at some point to allow it. If a cell is not present in the other coordinate file, it will be hidden.

Yes, my plan was exactly this: supply multiple coord files. Then, making this clear in the UI may not be trivial, especially if we have multiple dimensionality reductions.

I guess my problem right now is that I don't have a concrete example dataset where this is the case and it makes the dataset easier to understand. It's always easier to work with a concrete example.

For tabula muris, I wanted to prep one coord file per tissue.

On Wed, Sep 19, 2018 at 11:33 AM, Kamil Slowikowski < notifications@github.com> wrote:

When working with large single-cell datasets, it is often useful to look at two or more levels. Several papers have done analysis at two or more levels:

  • Level 1: PCA and tSNE on the full set of all cells.
  • Level 2: PCA and tSNE on each major subset of cells (e.g. only T cells, or only B cells, or only fibroblasts).

When browsing the data at Level 2, it's necessary to hide all of the cells except the cells in the chosen subset. Then the user can browse just the different clusters of T cells, for example, without worrying about all the other cell types in an experiment.

It would be nice to support this type of subset-level analysis in cellBrowser.

Thinking about how this might be implemented...

I think you might already be most of the way there, since you support multiple files with cell coordinates:

https://github.com/maximilianh/cellBrowser/blob/ 03ca5f32610c571d6ada8b0db7062d5e77287b5f/sampleData/sample1/ cellbrowser.conf#L41-L45

What happens if one of the coordinate files only lists coordinates for a subset of cells instead of all cells?

(I haven't tried, so I apologize in advance if this is already supported and I'm not aware.) It would be cool if cellBrowser automatically figures out that it should hide the cells that are not listed in a given coordinate file.

I wonder if you have thoughts about how to organize and navigate these types of subset-level results?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TTwFUwsk7OAV8Ialh-ovFaL_IoAhks5ucmPhgaJpZM4WwbZ0 .

josephmears commented 6 years ago

Here is an example from the AMP Phase 1 datasets. The overall tsne on PCs plot appears like this when you open the dataset:

screen shot 2018-10-12 at 10 10 54 am

Then, when you select the tsne on PCs layout for just B cells:

screen shot 2018-10-12 at 10 11 25 am

As you can see, all of the T cells, Monocytes and Fibroblasts have been assigned the coordinates 0,0 (~4,000 cells underneath the cell I selected):

screen shot 2018-10-12 at 10 11 48 am

The output when you run cbBuild: WARNING:root:sample name S037_L3Q2_O12 is in meta file but not in coordinate file t-SNE on PCs: B Cells, setting to (0,0)

I think it's an easy fix, but it would be great if they were dropped rather than assigned (0,0).

maximilianh commented 6 years ago

The data arrays that I use (important for speed) cannot hold special values. I think it's easiest if I use a special and rare value (like (12345,12345) to indicate that a coordinate is missing.

Also, your example shows that I should not calculate the label coordinates based on missing values. That's a clear bug. Thanks!

maximilianh commented 5 years ago

This was implemented a few weeks ago. 12345 is a the special value for missing cells and the 100% ignores these. Let me know if you need something else, otherwise we can close this ticket.

maximilianh commented 5 years ago

Hey, while you can have subsets easily now, you'll still have the old cluster assignments. Is this what you want? If you want to auto-switch it to another field as the new cluster, I'd still have to add that, though that should be easy.

Also, there is no documentation about this right now...

maximilianh commented 5 years ago

added docs to the tab-sep docs page.

josephmears commented 5 years ago

I think it's a great idea to add an "auto-switch", and I would take advantage of that capability for the datasets I'm working with, but it's not hugely pressing. Thank you!

maximilianh commented 5 years ago

Sorry it took so long, you can now add colorOnMeta= to any coordinate system to color by some meta field automatically when this coordinate system is loaded.

ms.cells.ucsc.edu uses this to activate pseudotime coloring automatically when you show the pseudotime layout.

On Fri, Mar 1, 2019 at 7:08 PM josephmears notifications@github.com wrote:

I think it's a great idea to add an "auto-switch", and I would take advantage of that capability for the datasets I'm working with, but it's not hugely pressing. Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/27#issuecomment-468756693, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TfWQGzsI46ur0ANUmygSdWAwgKfbks5vSWyQgaJpZM4WwbZ0 .

maximilianh commented 5 years ago

pip release 0.4.51

On Mon, Mar 11, 2019 at 4:07 PM Maximilian Haeussler maximilianh@gmail.com wrote:

Sorry it took so long, you can now add colorOnMeta= to any coordinate system to color by some meta field automatically when this coordinate system is loaded.

ms.cells.ucsc.edu uses this to activate pseudotime coloring automatically when you show the pseudotime layout.

On Fri, Mar 1, 2019 at 7:08 PM josephmears notifications@github.com wrote:

I think it's a great idea to add an "auto-switch", and I would take advantage of that capability for the datasets I'm working with, but it's not hugely pressing. Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/27#issuecomment-468756693, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TfWQGzsI46ur0ANUmygSdWAwgKfbks5vSWyQgaJpZM4WwbZ0 .