Open XpelC opened 1 year ago
- Divide the dataset by epithelial cells and non-epithelial cells. Cluster the two groups separately and try to define the cell type of each cluster. The cells are over clustered with pc=50, resolution =2
- epithelial, luminal, cancer
Hello Xinpu,
You should divide and for each of the two subsets: predict variable genes , run PCA , integrate, define clusters
Hello Stefano,
Ok, now I know why the figure looks quite similar as before, because I just run PCA and the downstream process. I’ll redo the predict variable gene process.
Best wishes, Xinpu
On Sep 13, 2022, at 9:39 PM, Stefano Mangiola @.**@.>> wrote:
Hello Xinpu,
You should divide and for each of the two subsets: predict variable genes , run PCA , integrate, define clusters
— Reply to this email directly, view it on GitHubhttps://github.com/stemangiola/cellsig/issues/69#issuecomment-1245287537, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALBA4473MNKYIQH7Q6PFPOLV6BRV5ANCNFSM6AAAAAAQK6OQTQ. You are receiving this because you authored the thread.Message ID: @.***>
Good afternoon Stefano,
Sorry for telling you this information, but I think my integrated data is ruined and all the variable features loss accidentally. Now I'm reintegrating the sample to recover the data and will catch up as soon as possible.
Best, Xinpu
On Sep 13, 2022, at 9:39 PM, Stefano Mangiola @.**@.>> wrote:
Hello Xinpu,
You should divide and for each of the two subsets: predict variable genes , run PCA , integrate, define clusters
— Reply to this email directly, view it on GitHubhttps://github.com/stemangiola/cellsig/issues/69#issuecomment-1245287537, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALBA4473MNKYIQH7Q6PFPOLV6BRV5ANCNFSM6AAAAAAQK6OQTQ. You are receiving this because you authored the thread.Message ID: @.***>
- Divide the dataset by epithelial cells and non-epithelial cells. Cluster the two groups separately and try to define the cell type of each cluster.
epithelial cell
seurat_clusters cell_type n
use
RunUMAP( dims = 1:30, spread = 0.5,min.dist = 0.01, n.meighbors = 10)
`
Ok, I’m currently filtering the immune cell dataset to fix the integration error (two samples containing too small number of cells).
On Sep 14, 2022, at 11:09 PM, Stefano Mangiola @.**@.>> wrote:
use
RunUMAP( dims = 1:30, spread = 0.5,min.dist = 0.01, n.meighbors = 10)
`
— Reply to this email directly, view it on GitHubhttps://github.com/stemangiola/cellsig/issues/69#issuecomment-1246741987, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALBA445ZLISP4ISN4KN5NFLV6HFBTANCNFSM6AAAAAAQK6OQTQ. You are receiving this because you authored the thread.Message ID: @.***>
use RunUMAP( dims = 1:30, spread = 0.5,min.dist = 0.01, n.meighbors = 10) It doesn't make much difference. epithelial
![]()
seurat_clusters cell_types
1 0 basal_intermediate,cancer,cancer_associated_fibroblast,club_cell,d… 2 1 basal,basal_intermediate,cancer,cancer_associated_fibroblast,club_… 3 2 basal_intermediate,cancer,cancer_associated_fibroblast,club_cell,e… 4 3 basal_intermediate,cancer,cancer_associated_fibroblast,club_cell,d… 5 4 basal,cancer,cancer_associated_fibroblast,club_cell,epithelial,lum… 6 5 cancer,cancer_associated_fibroblast,epithelial,fibroblast,luminal,… 7 6 basal_intermediate,cancer,cancer_associated_fibroblast,club_cell,e… 8 7 basal_intermediate,cancer,cancer_associated_fibroblast,club_cell,e… 9 8 cancer,epithelial,luminal,myoepithelial 10 9 cancer,endothelial,epithelial,luminal 11 10 basal,basal_intermediate,cancer,club_cell,epithelial,fibroblast,lu… 12 11 cancer,epithelial,luminal,myoepithelial 13 12 basal_intermediate,cancer,club_cell,epithelial,luminal,macrophage,… 14 13 cancer,epithelial,luminal 15 14 cancer,epithelial,luminal 16 15 basal,cancer,club_cell,epithelial,luminal,myoepithelial,perivascul… 17 16 cancer,epithelial,luminal 18 17 cancer,epithelial,luminal,macrophage_cycling,sperm,T_cycling 19 18 cancer,epithelial,luminal,perivascular 20 19 cancer,epithelial,luminal 21 20 cancer,epithelial,luminal,sperm 22 21 cancer,epithelial,fibroblast,luminal,perivascular 23 22 cancer,club_cell,epithelial,luminal,macrophage,perivascular 24 23 cancer,epithelial,luminal,myoepithelial 25 24 cancer,club_cell,endothelial,epithelial,fibroblast,luminal,perivas… 26 25 cancer,club_cell,endothelial,epithelial,luminal 27 26 cancer,epithelial,luminal 28 27 cancer,epithelial,luminal 29 28 cancer,epithelial,luminal 30 29 cancer,club_cell,epithelial,luminal 31 30 B,cancer,epithelial,luminal,plasma,plasmablast 32 31 cancer,epithelial,luminal,T_cycling 33 32 cancer,epithelial,luminal 34 33 cancer 35 34 cancer,epithelial,luminal 36 35 epithelial,luminal 37 36 cancer 38 37 luminal
other cell types The immune cluster is more clear this time
seurat_clusters cell_types
Ok try to define cluster identity by Friday, so we can meet and discuss. Prob for epithelial we are overclustering
FeaturePlot: Use the original umap (split by cell marker, and sample).
Barplot
- Check the scRNA sequencing method (10X, SMARTseq2)
- To see which sample is from cancer patients, which is not (print a table)
Tumor: sample
- Check the scRNA sequencing method (10X, SMARTseq2)
![]()
OK let's start by 9nly keeping the 10x
Actually, do you want to wait to see the integrated result with the breast cancer filtered? Then to decide if we should take out the seq-Well sequencing method? Since in my memory (also I checked with the cell name), the most problematic dataset which gives weird cell position is not GSE176031.
On Sep 18, 2022, at 4:38 PM, Stefano Mangiola @.**@.>> wrote:
[Screen Shot 2022-09-18 at 3 16 22 PM]https://user-images.githubusercontent.com/46272115/190886892-0eedfb37-06fb-43d0-9c43-e8718b33e6a4.png
OK let's start by 9nly keeping the 10x
— Reply to this email directly, view it on GitHubhttps://github.com/stemangiola/cellsig/issues/69#issuecomment-1250203630, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALBA44YIP4K3SUV6LQWABBLV622FVANCNFSM6AAAAAAQK6OQTQ. You are receiving this because you authored the thread.Message ID: @.***>
@XpelC can we double check that any of the 5 studies did not sort certain cell types before sequencing (for example trying to sort immune cells only).
@XpelC can we double check that any of the 5 studies did not sort certain cell types before sequencing (for example trying to sort immune cells only).
Please use this new version of umap result (I change the reference during the integration)
cell type for 5 datasets
sample.combined%>%count(dataset,cell_type)%>%print(n=Inf) tidyseurat says: A data frame is returned for independent data analysis.
A tibble: 58 × 3
dataset cell_type n
1 EGAS00001005115 B 1 2 EGAS00001005115 cancer 4002 3 EGAS00001005115 cancer_associated_fibroblast 422 4 EGAS00001005115 endothelial 1196 5 EGAS00001005115 macrophage 189 6 EGAS00001005115 mast 89 7 EGAS00001005115 NK 69 8 EGAS00001005115 perivascular 811 9 EGAS00001005115 T 676 10 EGAS00001005115 unassigned 120 11 EGAS00001005787 B 133 12 EGAS00001005787 basal 429 13 EGAS00001005787 CD4_naive 256 14 EGAS00001005787 CD4_Trm 169 15 EGAS00001005787 CD8_cytotoxic 154 16 EGAS00001005787 CD8_Trm 484 17 EGAS00001005787 club_cell 1259 18 EGAS00001005787 dendritic 88 19 EGAS00001005787 endothelial 439 20 EGAS00001005787 fibroblast 90 21 EGAS00001005787 hillock 172 22 EGAS00001005787 luminal 6008 23 EGAS00001005787 mac 128 24 EGAS00001005787 mac_cycling 16 25 EGAS00001005787 mac_mt 21 26 EGAS00001005787 mast 37 27 EGAS00001005787 monocyte 61 28 EGAS00001005787 NK 72 29 EGAS00001005787 NK_CD16_neg 51 30 EGAS00001005787 NK_CD16_pos 73 31 EGAS00001005787 sperm 1002 32 EGAS00001005787 T 1751 33 EGAS00001005787 Treg 114 34 GSE137829 B 539 35 GSE137829 endothelial 653 36 GSE137829 epithelial 11732 37 GSE137829 fibroblast 1565 38 GSE137829 mast 945 39 GSE137829 myeloid 873 40 GSE137829 myofibroblast 450 41 GSE137829 T 2293 42 GSE141445 basal_intermediate 1015 43 GSE141445 endothelial 3833 44 GSE141445 fibroblast 1051 45 GSE141445 luminal 22139 46 GSE141445 mast 1840 47 GSE141445 monocyte 1260 48 GSE141445 T 3933 49 GSE176031 apidocytes 526 50 GSE176031 CD8_Tem 2019 51 GSE176031 endothelial 1194 52 GSE176031 epithelial 12023 53 GSE176031 macrophage 282 54 GSE176031 monocyte 2101 55 GSE176031 NK 336 56 GSE176031 plasma 125 57 GSE176031 pre_B_cell 113 58 GSE176031 smooth_muscle 620
I change the reference during the integration
Well done. You can let Seurart choose the reference even, and leave it run overnight.
Please use this new version of umap result
Is this UMAP including epithelial + immune? Do you think with this new version we can annotate decently? And call it done?
other_ cell
epithelial
Great
other_ cell
Are you able to annotate Immune clusters?
epithelial
Would you be able to color by
- FeaturePlot: Use the original umap (split by cell marker, and sample).
- Use features and the markers:
The other features are not found in the slot of data
Use the code: FeaturePlot(sample.combined, features = c("CD14", "FCGR3A", "CD79A", "CD3G", "EPCAM", "VIM", "CD31", "CD68"), min.cutoff = 'q9')
The other features are not found in the slot of data
Always use SCT assay for colouring cells, not integrated.
If still not found use RNA assay
Always use SCT assay for colouring cells, not integrated.
By using the 'SCT' assay, we found almost all features except CD31. Even use 'RNA' assay, we could not found CD31 for endothelial.
The position of these colored features are checked, which matched our label
By using the 'SCT' assay, we found almost all features except CD31. Even use 'RNA' assay, we could not found CD31 for endothelial.
CD31 might actually have a different gene name, please double check google.
Great, I think all makes sense.
If you find endothelial, CD31 you are ready to complete the annotation!
If you find endothelial, CD31 you are ready to complete the annotation!
Actually I used PLVAP to be a substitute of CD31 as a marker of endothelial. Also, the position is correct, do you think it's ok?
Also, the position is correct, do you think it's ok?
We need to distinguish between fibroblasts and endothelial.
Get a better marker for fibroblast.
Maybe
Get a better marker for fibroblast.
These are all markers for fibroblast (VIM is what we used before)
According to our cell type result, cluster 32, 14, 18 are labeled as fibroblast. So maybe
Congrats, I think you got it. Please produce the other images for the to do list, and let's create a Seurat harmonised file, with cluster annotation.
Ok, since I’m still on the way heading to my apartment. See you tomorrow.
Best wishes, Xinpu
On Sep 26, 2022, at 6:15 PM, Stefano Mangiola @.**@.>> wrote:
Congrats, I think you got it. Please produce the other images for the to do list, and let's create a Seurat harmonised file, with cluster annotation.
— Reply to this email directly, view it on GitHubhttps://github.com/stemangiola/cellsig/issues/69#issuecomment-1257663496, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALBA44YSJJMZZUJ2KOH5YQTWAFLRNANCNFSM6AAAAAAQK6OQTQ. You are receiving this because you were mentioned.Message ID: @.***>
one_vs_all source @XpelC functions.R.zip
one_vs_all source @XpelC functions.R.zip
one_vs_all source @XpelC functions.R.zip
![]()
download file and unzip it, it is a R script
function: ComputeMarkers
Forget about this function, just use standard function from Seurat
- Check the cell marker of clusters with the following command
top10%>%print(n=Inf)
Please @XpelC use the below formatting for pasting tables and code
A tibble: 400 × 7
Groups: cluster [40]
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
<dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
1 0 5.44 0.925 0.19 0 0 TUBA4A
2 0 5.36 0.924 0.293 0 0 FAM177A1
3 0 5.20 0.804 0.222 0 0 CNOT6L
4 8.30e-264 5.49 0.479 0.229 1.66e-260 0 GCHFR
5 1.21e-254 5.92 0.604 0.371 2.41e-251 0 H3F3B
6 6.18e-216 6.28 0.466 0.214 1.24e-212 0 EMB
7 1.42e-162 4.98 0.424 0.237 2.85e-159 0 CCSER2
8 6.70e-143 5.47 0.267 0.224 1.34e-139 0 GLIPR2
9 1.93e- 15 7.44 0.122 0.29 3.85e- 12 0 CTDNEP1
10 5.54e- 12 5.78 0.116 0.28 1.11e- 8 0 AZI2
11 0 6.78 0.875 0.383 0 1 C19orf48
12 0 6.48 0.723 0.377 0 1 SMIM4
13 0 5.56 0.889 0.326 0 1 EIF4EBP1
14 0 5.37 0.82 0.369 0 1 NME1
15 0 4.71 0.829 0.386 0 1 BNIP3
16 4.10e-251 4.57 0.186 0.294 8.20e-248 1 PLEKHB2
17 2.31e-137 5.04 0.483 0.301 4.62e-134 1 SDR39U1
18 2.09e-130 4.38 0.483 0.275 4.18e-127 1 NOB1
19 1.37e- 75 8.85 0.454 0.329 2.74e- 72 1 BAX
20 1.95e- 3 7.22 0.28 0.24 1 e+ 0 1 ORC3
21 1.07e-266 6.19 0.626 0.337 2.14e-263 2 HOXA9
22 2.91e-249 3.59 0.577 0.338 5.82e-246 2 GTF3C5
23 2.94e- 65 5.16 0.421 0.295 5.89e- 62 2 ATPAF1
24 1.24e- 60 4.10 0.412 0.299 2.49e- 57 2 C1orf56
25 1.83e- 42 4.80 0.342 0.377 3.67e- 39 2 ERP29
26 3.10e- 31 9.60 0.451 0.405 6.21e- 28 2 HNRNPH1
27 3.32e- 17 3.83 0.362 0.287 6.64e- 14 2 IQCK
28 6.58e- 9 3.76 0.295 0.291 1.32e- 5 2 MED31
29 1.24e- 7 3.68 0.272 0.257 2.47e- 4 2 UBE2F
30 4.75e- 7 4.49 0.26 0.237 9.50e- 4 2 NUP54
31 1.84e- 49 7.71 0.172 0.313 3.69e- 46 3 SDCBP
32 3.11e- 40 5.92 0.366 0.261 6.22e- 37 3 JKAMP
33 3.22e- 36 7.22 0.304 0.271 6.44e- 33 3 PLEKHA3
34 1.36e- 30 5.87 0.354 0.234 2.72e- 27 3 RNF185
35 8.41e- 19 6.12 0.303 0.215 1.68e- 15 3 L3MBTL2
36 2.45e- 11 6.99 0.352 0.25 4.90e- 8 3 ARHGAP12
37 8.19e- 11 7.09 0.319 0.321 1.64e- 7 3 WDR6
38 1.21e- 10 7.54 0.307 0.242 2.42e- 7 3 COQ9
39 1.20e- 9 6.20 0.287 0.377 2.39e- 6 3 C8orf33
40 3.15e- 6 6.33 0.347 0.263 6.30e- 3 3 SHROOM1
41 0 10.7 0.692 0.35 0 4 TMEM123
42 0 8.22 0.886 0.209 0 4 AGR2
43 0 7.64 0.768 0.287 0 4 AZGP1
44 0 7.18 0.784 0.325 0 4 NEDD4L
45 0 7.07 0.843 0.291 0 4 LIMCH1
46 0 6.58 0.964 0.283 0 4 TSPAN1
47 0 6.49 0.977 0.391 0 4 CLDN4
48 3.41e-205 7.76 0.576 0.229 6.83e-202 4 RASEF
49 3.53e- 4 7.52 0.265 0.257 7.06e- 1 4 HEXB
50 2.58e- 3 8.11 0.338 0.271 1 e+ 0 4 CDK10
51 4.77e- 58 6.37 0.262 0.31 9.55e- 55 5 ATAD3A
52 2.30e- 50 4.65 0.199 0.263 4.60e- 47 5 SNX1
53 4.06e- 44 3.34 0.45 0.297 8.11e- 41 5 MTFR1
54 7.04e- 17 3.25 0.438 0.328 1.41e- 13 5 MRPL42
55 7.21e- 10 3.35 0.322 0.288 1.44e- 6 5 C12orf10
56 1.02e- 7 3.69 0.404 0.455 2.05e- 4 5 NDUFB1
57 6.55e- 7 3.55 0.233 0.258 1.31e- 3 5 RPAP3
58 8.23e- 6 6.68 0.265 0.258 1.65e- 2 5 CCDC43
59 3.10e- 5 3.37 0.376 0.338 6.20e- 2 5 GEMIN7
60 5.48e- 4 3.68 0.275 0.33 1 e+ 0 5 RBM42
61 1.27e-224 7.86 0.568 0.282 2.55e-221 6 EMC9
62 4.48e-122 9.24 0.571 0.374 8.96e-119 6 TRIM28
63 5.27e-114 6.21 0.549 0.386 1.05e-110 6 ARPC5L
64 1.18e-110 6.28 0.521 0.297 2.37e-107 6 SIX1
65 8.77e- 97 6.69 0.674 0.488 1.75e- 93 6 MIF
66 1.47e- 89 7.19 0.419 0.27 2.94e- 86 6 GRPEL1
67 3.47e- 71 8.16 0.607 0.478 6.95e- 68 6 ZFAS1
68 4.70e- 17 5.36 0.419 0.33 9.39e- 14 6 ARL6IP1
69 2.23e- 6 5.42 0.289 0.254 4.46e- 3 6 FAM114A1
70 7.02e- 3 6.84 0.305 0.273 1 e+ 0 6 RCAN3
71 1.71e-253 1.80 0.254 0.223 3.42e-250 7 GLIPR1
72 2.75e-129 6.11 0.197 0.268 5.49e-126 7 ARMC6
73 4.79e- 88 2.90 0.525 0.352 9.58e- 85 7 PPP3CA
74 6.31e- 75 7.82 0.33 0.285 1.26e- 71 7 JAK1
75 7.72e- 28 1.96 0.425 0.298 1.54e- 24 7 MON1B
76 3.62e- 26 5.69 0.449 0.328 7.24e- 23 7 TOB1
77 3.55e- 20 2.22 0.476 0.403 7.10e- 17 7 VMP1
78 2.46e- 14 2.94 0.413 0.328 4.93e- 11 7 MCCC2
79 1.82e- 9 8.17 0.346 0.292 3.63e- 6 7 LETM1
80 1.74e- 7 3.21 0.179 0.27 3.48e- 4 7 TXNDC9
81 0 8.76 0.818 0.359 0 8 FDPS
82 0 8.50 0.957 0.386 0 8 PAFAH1B3
83 0 8.18 0.844 0.32 0 8 ACOT13
84 0 7.93 0.97 0.46 0 8 PDCD5
85 0 7.37 0.935 0.418 0 8 SLC25A4
86 4.39e-293 6.97 0.544 0.256 8.79e-290 8 PLRG1
87 5.87e-196 9.36 0.468 0.262 1.17e-192 8 INTS10
88 5.75e- 96 6.82 0.351 0.239 1.15e- 92 8 ZCCHC10
89 5.18e- 9 8.90 0.262 0.248 1.04e- 5 8 PPT1
90 5.67e- 3 7.62 0.366 0.274 1 e+ 0 8 GSTA4
91 0 7.68 0.813 0.342 0 9 SPATS2L
92 0 5.67 0.802 0.348 0 9 IFT57
93 0 3.18 0.865 0.343 0 9 SNHG10
94 2.30e-122 3.38 0.506 0.289 4.61e-119 9 PAFAH1B2
95 1.73e- 60 2.90 0.264 0.33 3.47e- 57 9 DDX21
96 9.99e- 58 2.87 0.443 0.294 2.00e- 54 9 DHX9
97 5.75e- 27 3.17 0.269 0.303 1.15e- 23 9 PRKAR1A
98 3.12e- 16 6.04 0.292 0.259 6.23e- 13 9 SF3A3
99 3.12e- 13 3.37 0.33 0.288 6.24e- 10 9 PRMT2
100 1.72e- 10 3.09 0.29 0.283 3.45e- 7 9 SH3GLB1
101 0 8.02 0.683 0.261 0 10 ASAH1
102 0 7.46 0.881 0.145 0 10 PYCARD
103 0 7.41 0.122 0.52 0 10 MARCKSL1
104 0 7.29 0.951 0.22 0 10 CYBA
105 8.87e-188 8.32 0.106 0.256 1.77e-184 10 ATF1
106 8.70e- 75 7.47 0.429 0.246 1.74e- 71 10 C12orf45
107 4.38e- 67 7.35 0.54 0.438 8.75e- 64 10 TXN
108 1.42e- 52 9.83 0.453 0.349 2.84e- 49 10 EZR
109 1.38e- 8 10.0 0.429 0.37 2.76e- 5 10 PPA1
110 2.71e- 3 9.14 0.303 0.248 1 e+ 0 10 ZFP91
111 0 10.7 0.783 0.303 0 11 HIST1H2BD
112 0 6.95 0.795 0.383 0 11 HIST1H2AC
113 0 5.94 0.88 0.465 0 11 HMGB1
114 3.59e-157 6.19 0.557 0.3 7.19e-154 11 ITGAE
115 4.24e- 93 6.05 0.496 0.34 8.48e- 90 11 C1orf122
116 3.91e- 73 6.02 0.501 0.328 7.82e- 70 11 MAZ
117 4.64e- 50 5.55 0.476 0.32 9.27e- 47 11 PIN4
118 3.61e- 33 9.08 0.259 0.28 7.21e- 30 11 ATP6V1D
119 1.80e- 24 8.74 0.196 0.263 3.59e- 21 11 LLPH
120 2.01e- 15 8.05 0.331 0.309 4.02e- 12 11 SRPRB
121 0 13.2 0.825 0.277 0 12 CORO1B
122 0 11.2 0.741 0.294 0 12 ICA1
123 0 8.93 0.756 0.207 0 12 TBC1D4
124 3.09e-128 6.83 0.275 0.249 6.17e-125 12 ZC2HC1A
125 3.83e-122 8.08 0.484 0.359 7.67e-119 12 SYNGR2
126 2.04e- 74 7.02 0.282 0.268 4.08e- 71 12 APIP
127 2.01e- 59 6.93 0.394 0.299 4.02e- 56 12 PMVK
128 6.24e- 25 6.80 0.308 0.3 1.25e- 21 12 OGT
129 1.13e- 23 8.16 0.331 0.283 2.26e- 20 12 TLK1
130 1.51e- 7 7.01 0.218 0.256 3.03e- 4 12 C9orf85
131 0 15.2 0.942 0.186 0 13 GATA2
132 0 15.0 0.951 0.244 0 13 NSMCE1
133 0 11.1 0.688 0.285 0 13 MLPH
134 0 8.71 0.718 0.24 0 13 FDX1
135 0 8.55 0.783 0.195 0 13 RAB27B
136 0 8.39 0.799 0.256 0 13 NCOA4
137 0 8.04 0.66 0.186 0 13 DTNBP1
138 0 7.76 0.941 0.223 0 13 ID2
139 7.28e- 63 7.55 0.307 0.242 1.46e- 59 13 HINT3
140 3.94e- 3 9.51 0.205 0.254 1 e+ 0 13 SDCCAG8
141 0 10.2 0.964 0.222 0 14 SPON2
142 0 8.23 0.988 0.218 0 14 TIMP1
143 0 7.57 0.889 0.243 0 14 RAMP1
144 0 7.30 0.774 0.26 0 14 ALDH1A3
145 3.07e-208 9.71 0.648 0.35 6.15e-205 14 TCEAL4
146 2.41e-127 9.27 0.442 0.233 4.82e-124 14 COQ10B
147 1.95e- 73 7.75 0.525 0.304 3.89e- 70 14 CTSF
148 5.55e- 65 7.13 0.393 0.257 1.11e- 61 14 DDAH1
149 6.25e- 40 10.0 0.479 0.297 1.25e- 36 14 CHD9
150 3.28e- 5 7.01 0.307 0.349 6.56e- 2 14 PNRC1
151 0 10.5 0.876 0.328 0 15 HOMER2
152 0 8.10 0.783 0.331 0 15 BIK
153 0 8.06 0.991 0.42 0 15 PFN2
154 0 7.46 0.924 0.296 0 15 FAM3B
155 0 7.04 0.736 0.371 0 15 CD47
156 0 6.88 0.798 0.251 0 15 PRAC2
157 0 6.58 0.935 0.411 0 15 LY6E
158 0 6.44 0.756 0.389 0 15 ADI1
159 0 6.01 0.994 0.481 0 15 MDK
160 2.80e- 98 6.23 0.399 0.226 5.61e- 95 15 RAE1
161 0 10.6 0.884 0.477 0 16 ACTG1
162 0 8.29 0.822 0.357 0 16 REXO2
163 0 6.18 0.852 0.333 0 16 HMGA1
164 3.28e-243 4.97 0.617 0.245 6.56e-240 16 LPAR6
165 1.37e-138 4.59 0.548 0.253 2.74e-135 16 PBX1
166 1.42e- 89 4.37 0.509 0.325 2.83e- 86 16 C1orf21
167 6.58e- 40 4.50 0.31 0.24 1.32e- 36 16 TRAPPC2
168 1.87e- 10 4.89 0.326 0.291 3.74e- 7 16 OFD1
169 2.41e- 10 4.47 0.366 0.463 4.83e- 7 16 RCN2
170 2.83e- 5 5.99 0.173 0.301 5.66e- 2 16 NAAA
171 0 6.90 0.835 0.226 0 17 VAT1
172 0 6.81 0.958 0.282 0 17 DUSP23
173 0 6.43 0.996 0.34 0 17 NPDC1
174 0 5.79 0.997 0.316 0 17 FKBP1A
175 3.68e-212 4.28 0.481 0.251 7.36e-209 17 WIPI1
176 1.28e- 82 5.21 0.589 0.222 2.56e- 79 17 EHD4
177 1.39e- 60 6.18 0.53 0.386 2.79e- 57 17 CPE
178 3.85e- 18 8.15 0.317 0.251 7.69e- 15 17 QRICH1
179 5.78e- 18 6.78 0.253 0.257 1.16e- 14 17 CAMK1
180 7.31e- 5 4.78 0.411 0.364 1.46e- 1 17 EIF4G1
181 0 8.58 0.785 0.232 0 18 FAM13C
182 0 7.30 0.786 0.347 0 18 ARID5B
183 9.45e-162 7.83 0.373 0.272 1.89e-158 18 GABARAPL1
184 2.81e- 64 7.28 0.301 0.429 5.62e- 61 18 HNRNPAB
185 1.39e- 27 7.96 0.235 0.342 2.78e- 24 18 FKBP3
186 9.67e- 26 10.5 0.368 0.432 1.93e- 22 18 MRPL33
187 1.57e- 23 8.50 0.166 0.301 3.13e- 20 18 CACUL1
188 2.82e- 21 6.60 0.315 0.308 5.63e- 18 18 CCNG2
189 2.72e- 8 6.71 0.445 0.324 5.45e- 5 18 CBX6
190 2.14e- 3 7.66 0.275 0.256 1 e+ 0 18 GOSR2
191 0 5.49 0.911 0.247 0 19 MT1F
192 0 4.93 0.908 0.281 0 19 MT1G
193 0 4.66 0.989 0.326 0 19 MT1X
194 0 4.01 0.994 0.29 0 19 MT1E
195 1.93e-111 9.27 0.673 0.471 3.85e-108 19 H2AFY
196 8.55e- 61 3.85 0.472 0.297 1.71e- 57 19 RHOD
197 1.36e- 41 10.9 0.357 0.285 2.72e- 38 19 ANKRD10
198 9.27e- 17 4.50 0.333 0.272 1.85e- 13 19 THYN1
199 2.16e- 7 4.46 0.337 0.33 4.32e- 4 19 FAM133B
200 6.37e- 3 9.45 0.309 0.274 1 e+ 0 19 PTGR1
201 0 8.06 0.914 0.322 0 20 DNAJB1
202 0 7.59 0.898 0.366 0 20 HSPA8
203 0 7.28 0.966 0.304 0 20 HSP90AA1
204 0 6.42 0.682 0.157 0 20 APOBEC3G
205 0 6.31 0.709 0.292 0 20 PPP1R2
206 2.91e-212 7.37 0.564 0.262 5.83e-209 20 ELF1
207 1.73e- 96 6.53 0.111 0.426 3.47e- 93 20 PGP
208 9.55e- 52 7.59 0.449 0.253 1.91e- 48 20 ODF2L
209 5.88e- 18 9.25 0.354 0.269 1.18e- 14 20 BUB3
210 4.81e- 6 9.40 0.191 0.33 9.61e- 3 20 GNL3
211 5.28e-256 4.46 0.645 0.296 1.06e-252 21 ARL2
212 2.40e-206 5.88 0.52 0.235 4.81e-203 21 TAF13
213 1.35e-112 10.4 0.486 0.29 2.69e-109 21 MKLN1
214 2.58e- 92 5.16 0.202 0.342 5.17e- 89 21 CMTM8
215 3.89e- 91 4.88 0.481 0.277 7.79e- 88 21 TNFRSF1A
216 3.51e- 56 8.23 0.212 0.306 7.02e- 53 21 ZNF524
217 2.14e- 50 6.39 0.409 0.239 4.27e- 47 21 ARPC1B
218 3.79e- 50 7.70 0.357 0.266 7.58e- 47 21 ACTR10
219 1.91e- 33 8.37 0.407 0.311 3.82e- 30 21 OAZ2
220 4.08e- 19 4.60 0.342 0.259 8.15e- 16 21 HMGCL
221 0 8.53 0.945 0.364 0 22 RPS27
222 0 6.17 0.999 0.497 0 22 RPS19
223 0 5.40 0.997 0.477 0 22 RPS18
224 0 5.21 0.999 0.466 0 22 RPL13A
225 0 4.52 0.996 0.483 0 22 RPSA
226 2.41e-121 5.58 0.583 0.339 4.81e-118 22 TIMM50
227 6.85e- 43 9.20 0.268 0.292 1.37e- 39 22 RUSC1
228 3.20e- 35 6.77 0.212 0.291 6.40e- 32 22 CCDC12
229 2.31e- 14 5.54 0.233 0.257 4.61e- 11 22 PTPN2
230 6.89e- 11 4.85 0.265 0.297 1.38e- 7 22 PAFAH1B2
231 1.24e-219 4.43 0.596 0.26 2.48e-216 23 RAB27A
232 8.21e-169 6.43 0.341 0.229 1.64e-165 23 OSTF1
233 1.35e-148 4.30 0.558 0.217 2.71e-145 23 TTC39C
234 6.86e- 80 5.01 0.436 0.244 1.37e- 76 23 PPP6R2
235 3.99e- 58 6.16 0.362 0.243 7.97e- 55 23 SF3A1
236 2.92e- 34 6.75 0.429 0.356 5.84e- 31 23 GOLGB1
237 8.29e- 26 4.13 0.22 0.302 1.66e- 22 23 MTCH2
238 3.27e- 23 3.45 0.302 0.268 6.53e- 20 23 MRPS18C
239 5.97e- 12 3.53 0.222 0.315 1.19e- 8 23 TMEM208
240 6.40e- 9 5.43 0.338 0.365 1.28e- 5 23 MRPL18
241 0 6.12 0.875 0.348 0 24 DNAJA1
242 0 5.87 0.835 0.397 0 24 IER2
243 0 5.47 0.973 0.331 0 24 FOS
244 0 5.27 0.97 0.324 0 24 DUSP1
245 0 5.25 0.978 0.371 0 24 JUN
246 1.97e-202 6.03 0.532 0.225 3.94e-199 24 NDRG2
247 2.20e-175 5.76 0.656 0.299 4.39e-172 24 SERTAD3
248 1.18e- 58 7.20 0.428 0.31 2.36e- 55 24 PRPF38B
249 2.36e- 7 5.31 0.322 0.289 4.73e- 4 24 IPO7
250 1.06e- 4 5.26 0.384 0.343 2.13e- 1 24 PGK1
251 2.00e- 78 8.39 0.442 0.252 4.00e- 75 25 TBC1D20
252 5.43e- 33 8.24 0.369 0.262 1.09e- 29 25 SMG1
253 5.41e- 32 11.0 0.463 0.322 1.08e- 28 25 ACADVL
254 3.86e- 18 9.26 0.331 0.321 7.73e- 15 25 UBE2J1
255 4.46e- 15 9.98 0.246 0.374 8.93e- 12 25 SLC25A39
256 1.81e- 14 8.77 0.296 0.269 3.61e- 11 25 HBP1
257 2.01e- 14 10.1 0.349 0.262 4.03e- 11 25 NDUFS1
258 4.04e- 11 8.51 0.287 0.282 8.09e- 8 25 RABEP2
259 5.97e- 9 9.47 0.431 0.285 1.19e- 5 25 ABCC4
260 4.50e- 5 8.48 0.325 0.25 9.00e- 2 25 BRAT1
261 0 11.7 0.887 0.358 0 26 CALM1
262 0 10.9 0.919 0.256 0 26 GNAI2
263 0 10.9 0.918 0.361 0 26 ARGLU1
264 0 10.4 0.914 0.404 0 26 APLP2
265 0 9.27 0.993 0.243 0 26 SLC9A3R2
266 0 8.47 0.921 0.473 0 26 SRP14
267 0 7.68 0.798 0.308 0 26 MYH9
268 0 7.44 0.979 0.317 0 26 CCDC85B
269 4.25e-187 8.39 0.647 0.254 8.49e-184 26 IFNGR1
270 2.08e- 76 8.71 0.535 0.326 4.17e- 73 26 TM9SF2
271 0 12.6 0.88 0.441 0 27 ATP6V0B
272 0 11.5 0.887 0.267 0 27 HIF1A
273 0 11.2 0.757 0.214 0 27 DNASE2
274 0 10.3 0.999 0.208 0 27 CTSD
275 0 10.1 0.903 0.204 0 27 FUCA1
276 0 9.62 0.983 0.178 0 27 LGMN
277 0 9.52 0.999 0.199 0 27 CTSB
278 0 9.48 0.761 0.156 0 27 CYP27A1
279 0 9.01 0.986 0.206 0 27 CREG1
280 1.85e-299 9.21 0.734 0.258 3.70e-296 27 ELL2
281 0 3.50 0.76 0.267 0 28 WDR74
282 0 2.87 0.967 0.379 0 28 RPL10
283 1.01e-292 2.85 0.657 0.277 2.02e-289 28 ZFP36L2
284 6.00e-223 2.91 0.602 0.255 1.20e-219 28 TC2N
285 1.87e-127 5.02 0.496 0.299 3.73e-124 28 PDCD4
286 2.33e-113 2.81 0.396 0.236 4.66e-110 28 DPP4
287 1.26e- 59 4.57 0.268 0.424 2.52e- 56 28 H2AFZ
288 2.90e- 11 4.15 0.388 0.321 5.80e- 8 28 CMC1
289 5.16e- 11 3.48 0.332 0.317 1.03e- 7 28 TSPYL1
290 2.72e- 5 4.96 0.192 0.292 5.44e- 2 28 IFITM2
291 0 13.7 0.941 0.447 0 29 GNAS
292 0 10.0 0.924 0.403 0 29 MYL6B
293 0 9.95 0.832 0.322 0 29 SERINC2
294 0 9.92 0.881 0.37 0 29 ACTN4
295 0 9.43 0.971 0.346 0 29 SPINT1
296 0 9.18 0.889 0.262 0 29 MIPEP
297 7.37e-308 9.52 0.837 0.395 1.47e-304 29 PRDX6
298 2.08e-274 11.0 0.759 0.322 4.16e-271 29 VAMP8
299 1.60e-273 9.54 0.81 0.342 3.20e-270 29 FLNB
300 4.71e- 70 9.91 0.618 0.338 9.42e- 67 29 FDFT1
301 0 14.7 0.975 0.378 0 30 SSR4
302 0 12.3 0.975 0.282 0 30 FKBP11
303 0 9.51 0.926 0.352 0 30 HERPUD1
304 0 9.23 0.917 0.334 0 30 SEC11C
305 0 8.54 0.86 0.372 0 30 XBP1
306 6.70e-279 6.14 0.759 0.295 1.34e-275 30 SDF2L1
307 8.17e-213 9.69 0.737 0.396 1.63e-209 30 HSP90B1
308 6.62e- 80 6.53 0.618 0.386 1.32e- 76 30 PPIB
309 3.80e- 20 6.22 0.188 0.254 7.59e- 17 30 ZNF692
310 3.46e- 16 6.67 0.403 0.291 6.92e- 13 30 TP53INP1
311 1.93e-285 5.63 0.701 0.236 3.86e-282 31 RHOG
312 1.77e- 77 4.95 0.074 0.306 3.54e- 74 31 ARMC10
313 5.26e- 43 5.86 0.41 0.25 1.05e- 39 31 SNX2
314 5.07e- 24 10.6 0.291 0.422 1.01e- 20 31 DBI
315 6.66e- 22 10.1 0.361 0.307 1.33e- 18 31 CERS4
316 4.55e- 21 4.93 0.135 0.292 9.09e- 18 31 RDX
317 1.11e- 17 5.00 0.259 0.267 2.21e- 14 31 KBTBD3
318 3.25e- 9 6.96 0.217 0.278 6.49e- 6 31 NIP7
319 1.86e- 5 4.63 0.283 0.26 3.71e- 2 31 MAGOHB
320 4.23e- 3 11.1 0.477 0.41 1 e+ 0 31 TPD52
321 0 17.1 1 0.432 0 32 DSTN
322 0 12.8 0.95 0.348 0 32 MGST3
323 0 11.6 0.992 0.262 0 32 CSRP1
324 0 10.9 0.997 0.225 0 32 CRYAB
325 0 10.5 0.993 0.271 0 32 ADIRF
326 0 9.11 0.999 0.341 0 32 CD151
327 0 8.96 0.997 0.202 0 32 SOD3
328 0 8.32 0.955 0.245 0 32 ILK
329 0 8.26 0.986 0.319 0 32 LPP
330 1.87e- 62 10.5 0.438 0.326 3.75e- 59 32 UAP1
331 0 13.2 0.953 0.48 0 33 NUCKS1
332 0 9.88 0.993 0.442 0 33 HMGN2
333 0 9.71 0.942 0.424 0 33 H1FX
334 0 9.48 0.924 0.402 0 33 UCP2
335 4.76e-223 10.6 0.828 0.318 9.53e-220 33 IMMP1L
336 2.38e-194 9.36 0.829 0.377 4.76e-191 33 KPNB1
337 4.90e-174 9.99 0.745 0.263 9.79e-171 33 SNRNP40
338 3.89e-142 10.1 0.688 0.242 7.78e-139 33 RPA2
339 8.99e- 70 9.86 0.67 0.414 1.80e- 66 33 PKM
340 9.26e- 57 12.2 0.618 0.319 1.85e- 53 33 NASP
341 0 10.6 0.938 0.226 0 34 STXBP2
342 0 8.30 0.96 0.276 0 34 LITAF
343 0 8.08 0.972 0.275 0 34 NAMPT
344 0 7.49 0.922 0.339 0 34 SLC25A37
345 0 7.32 0.912 0.216 0 34 NINJ1
346 3.16e-102 7.82 0.62 0.269 6.32e- 99 34 VPS37C
347 3.91e- 59 7.71 0.127 0.355 7.83e- 56 34 FAM136A
348 1.24e- 32 8.27 0.478 0.293 2.48e- 29 34 CAMKK2
349 7.93e- 11 9.38 0.305 0.247 1.59e- 7 34 MDM2
350 6.00e- 3 12.2 0.342 0.312 1 e+ 0 34 OS9
351 1.55e- 41 9.42 0.293 0.331 3.10e- 38 35 MVD
352 1.95e- 19 12.0 0.591 0.392 3.91e- 16 35 DHRS7
353 2.41e- 17 8.17 0.375 0.351 4.82e- 14 35 KIF22
354 2.37e- 14 11.2 0.521 0.316 4.74e- 11 35 CHD3
355 2.40e- 8 9.39 0.433 0.344 4.80e- 5 35 C8orf82
356 1.20e- 7 11.0 0.539 0.413 2.41e- 4 35 CYB5A
357 2.29e- 7 8.96 0.529 0.408 4.58e- 4 35 TMEM14C
358 6.68e- 7 8.40 0.43 0.31 1.34e- 3 35 ERLEC1
359 2.47e- 5 8.97 0.448 0.331 4.95e- 2 35 ARG2
360 1.75e- 4 9.05 0.28 0.291 3.50e- 1 35 UROS
361 5.25e-236 6.68 0.864 0.411 1.05e-232 36 CTNNB1
362 9.77e-103 5.09 0.668 0.314 1.95e- 99 36 CDC42SE1
363 6.97e- 83 5.20 0.838 0.513 1.39e- 79 36 UQCRQ
364 2.39e- 27 5.82 0.535 0.334 4.78e- 24 36 BEX2
365 4.37e- 17 5.41 0.438 0.298 8.74e- 14 36 STAG2
366 1.91e- 14 5.54 0.556 0.405 3.81e- 11 36 VMP1
367 9.27e- 10 6.45 0.402 0.251 1.85e- 6 36 RRAS2
368 1.10e- 9 7.95 0.441 0.298 2.20e- 6 36 NFIX
369 1.12e- 4 5.89 0.391 0.25 2.24e- 1 36 ORAI3
370 4.90e- 3 5.96 0.391 0.339 1 e+ 0 36 C9orf16
371 8.53e- 76 8.00 0.915 0.472 1.71e- 72 37 EPCAM
372 3.43e- 72 8.18 0.868 0.399 6.86e- 69 37 DHCR24
373 2.46e- 56 10.3 0.743 0.244 4.92e- 53 37 TRAPPC12
374 6.50e- 44 5.99 0.824 0.412 1.30e- 40 37 CALR
375 1.28e- 26 10.4 0.629 0.321 2.56e- 23 37 KDM5B
376 2.20e- 26 6.23 0.471 0.245 4.41e- 23 37 DYNLT3
377 2.16e- 19 11.3 0.544 0.239 4.33e- 16 37 CHD1L
378 3.45e- 17 8.23 0.529 0.329 6.91e- 14 37 CAST
379 5.76e- 11 11.8 0.548 0.252 1.15e- 7 37 AP2A2
380 2.70e- 6 6.68 0.397 0.302 5.39e- 3 37 TAPBP
381 8.71e- 76 6.72 0.977 0.28 1.74e- 72 38 ISG15
382 5.84e- 30 5.99 0.762 0.25 1.17e- 26 38 IFI35
383 4.60e- 27 1.43 0.777 0.309 9.21e- 24 38 PLSCR1
384 6.26e- 15 2.96 0.6 0.311 1.25e- 11 38 PSME2
385 4.90e- 13 1.03 0.631 0.358 9.81e- 10 38 PSME1
386 1.53e- 5 1.82 0.438 0.279 3.06e- 2 38 BIRC2
387 7.12e- 5 0.901 0.392 0.266 1.42e- 1 38 XPNPEP1
388 1.81e- 4 6.69 0.462 0.283 3.62e- 1 38 PDK2
389 1.69e- 3 2.70 0.469 0.308 1 e+ 0 38 C15orf61
390 2.15e- 3 1.86 0.315 0.221 1 e+ 0 38 MTMR14
391 8.63e- 24 1.53 0.976 0.375 1.73e- 20 39 HSPB1
392 9.40e- 21 4.24 0.976 0.415 1.88e- 17 39 CLDN4
393 6.72e- 19 4.11 0.905 0.342 1.34e- 15 39 MT1X
394 7.61e- 14 2.87 0.881 0.284 1.52e- 10 39 GSTO2
395 1.16e- 13 5.22 0.81 0.261 2.32e- 10 39 AUH
396 2.53e- 8 7.97 0.762 0.263 5.06e- 5 39 MT1F
397 5.22e- 8 3.48 0.857 0.452 1.04e- 4 39 CLDN3
398 8.76e- 8 4.67 0.905 0.493 1.75e- 4 39 KRT18
399 1.22e- 7 1.65 0.714 0.317 2.45e- 4 39 ADH5
400 2.52e- 7 2.88 0.643 0.243 5.04e- 4 39 TIMM23
- color UMAP by cell mitochondrial
- by sample
- 10x vs smart-seq
- total RNA
All looks good you can proceed.
- Check the cell marker of clusters with the following command pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_log2FC) -> top10 DoHeatmap(pbmc, features = top10$gene) + NoLegend()
Will the graph looks better if I merged similar cluster first, then find the variable feature?
Will the graph looks better if I merged similar cluster first, then find the variable feature
Export the plot in pdf with extremely high height, and label each cluster, some of them will have same identify. each gene name should be visible and non overlapping. You will have to spend some hours doing annotation. Hopefully tomorrow you will have finished.
Export the plot in pdf with extremely high height, and label each cluster, some of them will have same identify. each gene name should be visible and non overlapping. You will have to spend some hours doing annotation. Hopefully tomorrow you will have finished.
So can I merge different cluster before finding markers?
Export the plot in pdf with extremely high height, and label each cluster, some of them will have same identify. each gene name should be visible and non overlapping. You will have to spend some hours doing annotation. Hopefully tomorrow you will have finished.
So can I merge different cluster before finding markers?
But you merge based on what? if you are confident about cluster identity Before merging, you can merge. But the heatmap if useful for exactly check cluster identity.
- DoHeatmap(pbmc, features = top10$gene) + NoLegend()
Since there are too many features when the cluster number is 40. I find to divide them into 4 groups might work, and will let it run overnight.
cluster 0-9
cluster 10-19
cluster 20-29
cluster 30-39
These subdivisions were made based on different macro clusters? Or based on what? If they were based just on an ordinal subdivision, this is not the right way to do it.
In this case you should 1) group the 40 clusters in bigger cluster based on the consensus identity of the original annotation (e.g. cluster 1 t-memory cd8, cluster 6 t memory cd4; then cluster 1 and 6 get the label t memory) 2) much much less cluster of which you know the rough identity, you calculate the makers
Before all that, could you please paste here a table with the best cluster label, given by you just looking at the original annotation?
Thanks.
These subdivisions were made based on different macro clusters? Or based on what? If they were based just on an ordinal subdivision, this is not the right way to do it.
Just because 40 clusters have too many features which is impossible to see a clear heat map. So I plot them every time with 10 of them. I'll send you the file with best cluster label.
Just because 40 clusters have too many features which is impossible to see a clear heat map. So I plot them every time with 10 of them.
This is the wrong way to do it.
In this case you should
- group the 40 clusters in bigger cluster based on the consensus identity of the original annotation (e.g. cluster 1 t-memory cd8, cluster 6 t memory cd4; then cluster 1 and 6 get the label t memory)
- much much less cluster of which you know the rough identity, you calculate the makers
Or divide the 40 cluster in 5/6 macroclusters of epithelial, t cells, b cells, fibro, etc.. and compose 5/6 heatmaps
Or divide the 40 cluster in 5/6 macroclusters of epithelial, t cells, b cells, fibro, etc.. and compose 5/6 heatmaps
T cell
epithelial
endothelial
fibroblast
other cells (B, monocyte, mast, adipocyte)
Amazing, the only thing left is to give the specific cluster identities. Tomorrow after you have done that we should meet.
I suggest to use SingleR on your clusters for a final confirmation.
Clean the dataset
cell type name formatting
[x] Two columns: one column is original cell type names, the other column is the formatted names.
[x] These name should be lower case, with no space, singular (eliminate inconsistency)
[x] Left join the table with the data.
Check the dataset:
Cluster the cells separately [NOT NEEDED ANYMORE]
cell type decision
https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
Sanity check
double check: