hammerlab / t-cell-data

https://tcelldata.hammerlab.org
6 stars 1 forks source link

Exploring FlowRepository #5

Open hammer opened 6 years ago

hammer commented 6 years ago

https://flowrepository.org has an API! https://flowrepository.org/images/pdf/FlowRepositoryAPI.pdf

hammer commented 6 years ago

To analyze FCS files, we can use flowCore.

High throughput automated analysis of big flow cytometry data (2018) recommends flowClean, flowAI, flowDensity, flowType, and RchyOptimyx for analysis of FCS data.

hammer commented 5 years ago

Systematic Analysis of Cell-to-Cell Expression Variation of T Lymphocytes in a Human Cohort Identifies Aging and Genetic Associations has some flow data that would be rad

hammer commented 5 years ago

Note also that the OMIP data is on FlowRepository! OMIP-017 and OMIP-030, both on CD4 subsets, probably the best place to start.

hammer commented 5 years ago

Some other flow software to try out:

eric-czech commented 5 years ago

In experimenting with FlowRepository, I found that some datasets have associated workspaces and others do not, and some like OMIP-030 have .jo files that might be helpful for reproducibility except that @armish found that it doesn't seem possible to load and then export these in a format compatible with flowWorkspace (which would be a way to work with FlowJo analyses directly within the flowFrame family of tools).

According to the flowWorkspace vignette:

This package supports importing of Version 2.0 XML workspaces only. We cannot import .jo files directly. You will have to save them in XML workspace format, and ensure that that format is workspace version 2.0. The package has been tested and works with files generated using flowJo version 9.1 on Mac OS X. XML generated by older versions of flowJo on windows should work as well. We do not yet support flowJo’s Chimera XML schema, though that support will be provided in the future.

eric-czech commented 5 years ago

Issues with using workflows in FlowRepository not withstanding, it does seem pretty reasonable to use the combination of FlowRepositoryR, openCyto, flowCore, and ggcyto to work with the data directly and get close to results seen in some of the publications.

Here is one example at least for OMIP-021 (which has no associated FlowJo workspace anyhow) showing what we could do be able to associate cell types with some different expression markers:

OMIP-021 Analysis

It was definitely a pain coming up with a similar gating set, but I don't think it would be too bad with some more practice. I also found a reasonable way to be able to draw gates manually using base R graphics:

# Create a scatter plot of 2 channels in some flow dataset
library(flowDensity)
flowDensity::plotDens(fr c('CD4, 'CD8'))

# Open a "locator" which will let you click on points defining some region
coords <- locator(n = 6, type = "p", lwd = 2, pch = 16, col = "red")

# Convert the coordinates to a flowCore compatible filter (i.e. polygonGate)
g <- data.frame(coords) 
colnames(g) <- c('CD4', 'CD8')
g <- polygonGate(.gate=g)

# This gate `g` could now either be dumped into a gating template csv 
# or used directly by openCyto

This all looks something like this when you're done:

screen shot 2018-10-11 at 1 39 14 pm

For posterity, I took this from CytoGate, but the project is a little hard to use directly since it requires a bleeding edge version of openCyto that isn't installed in their latest docker container.

hammer commented 5 years ago

Wow GitHub really needs the :open_mouth: emoji for reactions. Impressive progress! I'm bummed OMIP-017 doesn't have data. In their supplementary materials they state

The FlowJo workspace at repository.org contains one PBMC sample. It illustrates the gating performed to determine the relative proportion of CD4+ T-helper subsets

I'm guessing they meant flowrepository.org, so perhaps we can contact the authors to see what's up with that data? Corresponding author is Yolanda Mahnkey, ahnkey@mail.nih.gov. (Edit: looks like she left the NIH in 2013 and is now in NYC as an independent flow consultant https://www.linkedin.com/in/yolanda-mahnke-6951b915/)

Also there has to be some kind of workspace converter to get the OMIP-030 data?

Finally, maybe we should focus on OMIP-036, since that has some fun data related to co-inhibitory receptors, obviously relevant to immunooncology.

eric-czech commented 5 years ago

Hm well I submitted a support ticket to FlowRepository about it so maybe that'll net something.

Yea I'm sure we can make something work with 30 -- I'll see what information I can get from FlowJo to go off of when I'm in the lab tomorrow and can look over Arman's shoulder =)

I'll give 36 a shot too, looks to be a bit more challenging.

hammer commented 5 years ago

@eric-czech any reason you chose not to use R Markdown for the OMIP-021 analysis? Seems like you wrote it on GitHub rather than in RStudio for knitr.

eric-czech commented 5 years ago

I did, I just made the output type "github_document". I originally tried rendering it with bookdown in this repo, but that wasn't going well so I figured I'll just stick to single analysis documents like this for now.

hammer commented 5 years ago

Oh duh I just followed your link to the .md file and did not think to check the repo for the .Rmd file.

eric-czech commented 5 years ago

Here's a rough OMIP-030 Analysis which is a good bit different since I'm using manual gates from FlowJo instead of anything in openCyto. What I'm confused by on this one though is how many cells end up getting assigned to more than one terminal group in the workflow. At one point the flow branches to identify CM/EM cells and Th cells starting from the same parent population and it ends up with a lot of double labeling -- not sure what to do about that.

armish commented 5 years ago

@eric-czech:

At one point the flow branches to identify CM/EM cells and Th cells starting from the same parent population and it ends up with a lot of double labeling -- not sure what to do about that.

As we discussed today, this is not a bug but a feature :) Th* labels are mostly independent from the CM/EM labels so under normal circumstances, we expect to see cells labeled with different combinations of these: e.g. Th17 CM, Th17 EM, Th9 CM, Th9 EM, etc.

Relevant: https://www.reddit.com/r/Immunology/comments/89vvh1/can_somebody_do_a_sanity_check_on_my/

eric-czech commented 5 years ago

Ah thanks @armish , that makes sense. In that case here's a rewrite that treats the distinction much better (the results are definitely more interpretable):

OMIP-030 Analysis (Take 2)

eric-czech commented 5 years ago

There's an immunology subreddit? Niceeee

hammer commented 5 years ago

Some thoughts on the OMIP-021 analysis:

Some thoughts on OMIP-030 analysis:

Want to send a PR over for the OMIP-030 analysis?

Thanks again for the great work!

eric-czech commented 5 years ago

That all makes sense on OMIP-021, will do. My overall plan for the chapter is:

screen shot 2018-11-19 at 2 20 05 pm screen shot 2018-11-21 at 9 32 24 pm

That last graphic is interesting because it shows the original gating results (i.e. manually assigned cell types) as the colors of the pie slices mapped to the unsupervised clustering represented as the minimum spanning tree over the SOM nodes as well as a "meta" clustering of those nodes (the semi-transparent "halos") that shows what you would have to study individually to come up with cell types associated with the clusters -- so it is good in this case that the halo colors line up with the pie color clusters. And where the clusters don't align as well, the UMAP separation for those cells isn't great either so I thought it would be a good way to give a sense for which T cell types aren't differentiated well by this panel.

Anyways, I'm working on cleaning up the code for the above and will get a PR in soon but stop me if you don't think that would be a good way to round out the chapter.

hammer commented 5 years ago

@eric-czech found some highly relevant data from Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium (2016). You need to sign up for an account on ImmuneSpace then check out the Lyoplate Study. From there you can link to the FCS files. I think they also have the R scripts they used to do the automatic gating too...

hammer commented 5 years ago

Ah Guidelines for the use of flow cytometry and cell sorting in immunological studies (2017) also looks useful.

eric-czech commented 5 years ago

FYI re: Exploring FlowRepository -- I was tired of trying to find relevant datasets with google and/or their search functions so I scraped the FCS header and text segments from all of their datasets and dumped the related data and code into this flowrepository-metadata-db repo.

The main purpose was to get a table like this for each FCS file they make available (or at least up to some limit for one dataset) where this also reflects an effort to normalize the parameter names:

param_name param_channel param_resolved filename exp_id
NA FSC-A FSC-A Compensation Controls_APC Stained Control.fcs FR-FCM-ZZ5C
CCR6 Pr141Di CD196 266_Tcell_tumor_Tcells-cd3.fcs FR-FCM-ZY9M
Time Time Time 4_C01.fcs FR-FCM-ZYGV
CD8 QD585 V585-A CD8 NV01 612 P2a…Live.fcs FR-FCM-ZZFV
NA Blue Vid-A Blue Vid-A BEADS_A700_G09.fcs FR-FCM-ZZ2V
CXCR5 Er166Di CD185 LKH DMSO_4w_cct.fcs FR-FCM-ZZTJ

I thought this would be helpful for doing more in-depth queries across their datasets and I found some useful CyTOF datasets this way.