cytof数据拆分 - Githubissues

cytof数据拆分 by 生信技能树

前面我们系统性介绍了cytof数据过程，以为应该是没有难点了。如果你是第一次接触cytof数据，可以看我在《生信技能树》发布了cytof这样的质谱流式数据处理系列文字版教程，就是基于 FlowSOM 哦：

如果你确实纠结于cytof数据处理的软件和工具的选择，也可以看2019的文章，Liu et al. Genome Biology，标题是；《A comparison framework and guideline of clustering methods for mass cytometry data》，在6个数据集上面，测试了9种算法的表现。

最近接到粉丝求助，看了我的教程，发现没办法处理一个文献的cytof数据集，标题是：《Single‑cell profiling of myasthenia gravis identifies a pathogenic T cell signature》，他这个文献的cytof数据在：https://data.mendeley.com/datasets/nkcb8nc7w8/1 ，感兴趣的也可以自行下载进行处理。

队列是：peripheral blood mononuclear cells (PBMCs) from myasthenia gravis patients (MG, n = 38) and healthy controls (CTRL, n = 21)

Thymic leukocytes from MG patients (n = 4) and non-MG incidental mass lesion controls (n = 6)
Thymic tissue sections of MG patients (n = 13) and non-MG controls (n = 6)

有两种 CyTOF：

surface markers
intracellular cytokines (following brief antigen-independent restimulation)

发现它居然就是单独的fcs文件，如下所示：


$ ls -lh  ../dataFiles/ |cut -d" " -f 5-

2.4K Mar 24  2021 CyTOF_blood_live_ICS_metadata.csv
1.1G Mar 24  2021 CyTOF_blood_live_ICS_untrans_merged.fcs
2.3K Mar 24  2021 CyTOF_blood_live_surf_metadata.csv
966M Mar 24  2021 CyTOF_blood_live_surf_untrans_merged.fcs

也就是说，这个文献里面的两个队列，多个病人样品的cytof数据，被合并为同一个文件啦。确实有点麻烦，我使用下面的代码进行了简单的探索：

require(cytofWorkflow)

c1 = read.flowSet('../dataFiles/CyTOF_blood_live_ICS_untrans_merged.fcs')
# A flowSet with 1 experiments.
c1 

# flowFrame object 
c1[[1]]

c1=c1[[1]]

# expression values
exprs( c1 )[1:6, 1:5]
dim(exprs( c1 ))
as.character(colnames(exprs(c1)))

主要是 flowFrame 这个对象的理解，对象都是复杂的，但是这个flowFrame对象最重要的其实就是矩阵，里面是700万个单细胞的30多个抗体的信号值矩阵，所以我使用了下面的代码进行拆开：


exp_list = split(as.data.frame( exprs( c1 ) ),
                 exprs( c1 )[,37])
names(exp_list) = paste0('p',names(exp_list))
names(exp_list)

dir.create('new') 

lapply(names(exp_list) , function(x){
  # x = names(exp_list)[[1]];x
  tmp = c1
  tmp@exprs = as.matrix(exp_list[[x]] )
  write.FCS(tmp,file.path('new',paste0(x,'.fcs')))
})

把每个样品都输出自己的fcs文件，输出如下所示的文件：


$ ls -lh  new/|cut -d" " -f 5-

 20M Feb  7 16:52 p1.fcs
 20M Feb  7 16:52 p10.fcs
 20M Feb  7 16:52 p11.fcs
5.0M Feb  7 16:52 p12.fcs
 21M Feb  7 16:52 p13.fcs
 22M Feb  7 16:52 p14.fcs
 21M Feb  7 16:52 p15.fcs
 16M Feb  7 16:52 p16.fcs
 18M Feb  7 16:52 p17.fcs
 15M Feb  7 16:52 p18.fcs
 14M Feb  7 16:52 p19.fcs

每个 fcs 后缀的文件，都是单独一个样品的 cytof数据文件，里面都是十多万个单细胞哦！

然后仍然是批量读取：


p1='new'
fs1=list.files(p1,'*fcs' )
fs1
samp <- read.flowSet(files = fs1,path = p1)

读取后就可以进行我们前面的教程处理啦，教程链接合辑是：

可以看到绝大部分样品都是细胞数量在10万附近：

细胞数量在10万

而且绝大部分都是T细胞，包括CD4和CD8的T细胞：

绝大部分都是T细胞

但是如果要做到文章那样的降维聚类分群和生物学命名，还是有点难度哦：

文章那样的降维聚类分群和生物学命名

感兴趣的小伙伴可以自行阅读：《Single‑cell profiling of myasthenia gravis identifies a pathogenic T cell signature》，它会提示如何挑选不同样品走这个cytof数据处理流程，挑选不同抗体进行可视化。

一个项目的降维聚类分群和比例变化探索

仅需人民币800元即可，如果你的cytof项目也是两个分组，三五个样品的数量。给我fcs文件，我出全部的图表给你。

还等什么呢，赶快扫描下面二维码即可添加微信咨询！
（添加好友务必备注高校或者工作单位+姓名，方便后续认识）

ixxmu / mp_duty

cytof数据拆分 #1703

cytof数据拆分 by 生信技能树

一个项目的降维聚类分群和比例变化探索