Open hammer opened 6 years ago
To analyze FCS files, we can use flowCore.
High throughput automated analysis of big flow cytometry data (2018) recommends flowClean, flowAI, flowDensity, flowType, and RchyOptimyx for analysis of FCS data.
Systematic Analysis of Cell-to-Cell Expression Variation of T Lymphocytes in a Human Cohort Identifies Aging and Genetic Associations has some flow data that would be rad
Note also that the OMIP data is on FlowRepository! OMIP-017 and OMIP-030, both on CD4 subsets, probably the best place to start.
Some other flow software to try out:
In experimenting with FlowRepository, I found that some datasets have associated workspaces and others do not, and some like OMIP-030 have .jo
files that might be helpful for reproducibility except that @armish found that it doesn't seem possible to load and then export these in a format compatible with flowWorkspace
(which would be a way to work with FlowJo analyses directly within the flowFrame
family of tools).
According to the flowWorkspace vignette:
This package supports importing of Version 2.0 XML workspaces only. We cannot import .jo files directly. You will have to save them in XML workspace format, and ensure that that format is workspace version 2.0. The package has been tested and works with files generated using flowJo version 9.1 on Mac OS X. XML generated by older versions of flowJo on windows should work as well. We do not yet support flowJo’s Chimera XML schema, though that support will be provided in the future.
Issues with using workflows in FlowRepository not withstanding, it does seem pretty reasonable to use the combination of FlowRepositoryR
, openCyto
, flowCore
, and ggcyto
to work with the data directly and get close to results seen in some of the publications.
Here is one example at least for OMIP-021 (which has no associated FlowJo workspace anyhow) showing what we could do be able to associate cell types with some different expression markers:
It was definitely a pain coming up with a similar gating set, but I don't think it would be too bad with some more practice. I also found a reasonable way to be able to draw gates manually using base R graphics:
# Create a scatter plot of 2 channels in some flow dataset
library(flowDensity)
flowDensity::plotDens(fr c('CD4, 'CD8'))
# Open a "locator" which will let you click on points defining some region
coords <- locator(n = 6, type = "p", lwd = 2, pch = 16, col = "red")
# Convert the coordinates to a flowCore compatible filter (i.e. polygonGate)
g <- data.frame(coords)
colnames(g) <- c('CD4', 'CD8')
g <- polygonGate(.gate=g)
# This gate `g` could now either be dumped into a gating template csv
# or used directly by openCyto
This all looks something like this when you're done:
For posterity, I took this from CytoGate, but the project is a little hard to use directly since it requires a bleeding edge version of openCyto that isn't installed in their latest docker container.
Wow GitHub really needs the :open_mouth: emoji for reactions. Impressive progress! I'm bummed OMIP-017 doesn't have data. In their supplementary materials they state
The FlowJo workspace at repository.org contains one PBMC sample. It illustrates the gating performed to determine the relative proportion of CD4+ T-helper subsets
I'm guessing they meant flowrepository.org
, so perhaps we can contact the authors to see what's up with that data? Corresponding author is Yolanda Mahnkey, ahnkey@mail.nih.gov. (Edit: looks like she left the NIH in 2013 and is now in NYC as an independent flow consultant https://www.linkedin.com/in/yolanda-mahnke-6951b915/)
Also there has to be some kind of workspace converter to get the OMIP-030 data?
Finally, maybe we should focus on OMIP-036, since that has some fun data related to co-inhibitory receptors, obviously relevant to immunooncology.
Hm well I submitted a support ticket to FlowRepository about it so maybe that'll net something.
Yea I'm sure we can make something work with 30 -- I'll see what information I can get from FlowJo to go off of when I'm in the lab tomorrow and can look over Arman's shoulder =)
I'll give 36 a shot too, looks to be a bit more challenging.
@eric-czech any reason you chose not to use R Markdown for the OMIP-021 analysis? Seems like you wrote it on GitHub rather than in RStudio for knitr.
I did, I just made the output type "github_document". I originally tried rendering it with bookdown in this repo, but that wasn't going well so I figured I'll just stick to single analysis documents like this for now.
Oh duh I just followed your link to the .md
file and did not think to check the repo for the .Rmd
file.
Here's a rough OMIP-030 Analysis which is a good bit different since I'm using manual gates from FlowJo instead of anything in openCyto
. What I'm confused by on this one though is how many cells end up getting assigned to more than one terminal group in the workflow. At one point the flow branches to identify CM/EM cells and Th cells starting from the same parent population and it ends up with a lot of double labeling -- not sure what to do about that.
@eric-czech:
At one point the flow branches to identify CM/EM cells and Th cells starting from the same parent population and it ends up with a lot of double labeling -- not sure what to do about that.
As we discussed today, this is not a bug but a feature :) Th*
labels are mostly independent from the CM/EM
labels so under normal circumstances, we expect to see cells labeled with different combinations of these: e.g. Th17 CM
, Th17 EM
, Th9 CM
, Th9 EM
, etc.
Relevant: https://www.reddit.com/r/Immunology/comments/89vvh1/can_somebody_do_a_sanity_check_on_my/
Ah thanks @armish , that makes sense. In that case here's a rewrite that treats the distinction much better (the results are definitely more interpretable):
There's an immunology subreddit? Niceeee
Some thoughts on the OMIP-021 analysis:
readr::format_csv
+ textConnection
to avoid the temporary file (though the textConnection
is not necessary, I just find its existence interesting)Some thoughts on OMIP-030 analysis:
Want to send a PR over for the OMIP-030 analysis?
Thanks again for the great work!
That all makes sense on OMIP-021, will do. My overall plan for the chapter is:
That last graphic is interesting because it shows the original gating results (i.e. manually assigned cell types) as the colors of the pie slices mapped to the unsupervised clustering represented as the minimum spanning tree over the SOM nodes as well as a "meta" clustering of those nodes (the semi-transparent "halos") that shows what you would have to study individually to come up with cell types associated with the clusters -- so it is good in this case that the halo colors line up with the pie color clusters. And where the clusters don't align as well, the UMAP separation for those cells isn't great either so I thought it would be a good way to give a sense for which T cell types aren't differentiated well by this panel.
Anyways, I'm working on cleaning up the code for the above and will get a PR in soon but stop me if you don't think that would be a good way to round out the chapter.
@eric-czech found some highly relevant data from Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium (2016). You need to sign up for an account on ImmuneSpace then check out the Lyoplate Study. From there you can link to the FCS files. I think they also have the R scripts they used to do the automatic gating too...
Ah Guidelines for the use of flow cytometry and cell sorting in immunological studies (2017) also looks useful.
FYI re: Exploring FlowRepository -- I was tired of trying to find relevant datasets with google and/or their search functions so I scraped the FCS header and text segments from all of their datasets and dumped the related data and code into this flowrepository-metadata-db repo.
The main purpose was to get a table like this for each FCS file they make available (or at least up to some limit for one dataset) where this also reflects an effort to normalize the parameter names:
param_name | param_channel | param_resolved | filename | exp_id |
---|---|---|---|---|
NA | FSC-A | FSC-A | Compensation Controls_APC Stained Control.fcs | FR-FCM-ZZ5C |
CCR6 | Pr141Di | CD196 | 266_Tcell_tumor_Tcells-cd3.fcs | FR-FCM-ZY9M |
Time | Time | Time | 4_C01.fcs | FR-FCM-ZYGV |
CD8 QD585 | V585-A | CD8 | NV01 612 P2a…Live.fcs | FR-FCM-ZZFV |
NA | Blue Vid-A | Blue Vid-A | BEADS_A700_G09.fcs | FR-FCM-ZZ2V |
CXCR5 | Er166Di | CD185 | LKH DMSO_4w_cct.fcs | FR-FCM-ZZTJ |
I thought this would be helpful for doing more in-depth queries across their datasets and I found some useful CyTOF datasets this way.
https://flowrepository.org has an API! https://flowrepository.org/images/pdf/FlowRepositoryAPI.pdf