ncborcherding / escape

Easy single cell analysis platform for enrichment
https://www.borch.dev/uploads/screpertoire/articles/running_escape
MIT License
139 stars 20 forks source link

Having two condition in Seurat #28

Closed beginner984 closed 3 years ago

beginner984 commented 3 years ago

Hi

Indeed this is not an issue with your package rather my question

I have a Seurat object with two conditions and several clusters

> unique(immune.combined.sct@meta.data$Condition)
[1] "cancer"  "control"
> unique(immune.combined.sct@meta.data$integrated_snn_res.0.5)
 [1] 0  4  12 2  6  9  1  3  10 7  11 5  14 13 8 
Levels: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
> 

I want to get pathways by comparing cluster 11 in cancer versus control samples

What is your suggestion please?

Thanks a lot

ncborcherding commented 3 years ago

Hey beginner984,

No problem - escape will give you a result for all the single-cells in your Seurat object. So after the enrichment you can just pick and choose what you want to compare.

In your case, something like (assuming you've but the enrichment results in your meta data) below to get the mean values of condition 11 between your control and cancer conditions:

matrix <- immune.combined.sct[[]] %>% #this pulls the meta data
                     susbet(integrated_snn_res.0.5) %>% #just for cluster 11
                     group_by(Condition) %>% 
                     summarise(across(X:Y, mean))

X and y are the columns of the meta data that correspond to the enrichment scores (-2 numbers for removing integrated_snn_res0.5 and condition).

Or you can use the first two lines to return the meta data for just cluster 11:

subset.meta.data <- immune.combined.sct[[]] %>% #this pulls the meta data
                     susbet(integrated_snn_res.0.5)

I am going to close this issue because as you say, it is not necessarily an issue with the code, but I think it is a good question that would help other users, so please feel free to respond with additional comments and be happy to continue to help. Or if you are more comfortable, you can email me.

Nick

beginner984 commented 3 years ago

Thanks a million for developing this software

For Cluster 11 I just have a few cells for controls as below

But violin show much more which I can not figure out

> dittoPlot(immune.combined.sct, "HALLMARK_INFLAMMATORY_RESPONSE", group.by = "Condition") + 
+     scale_fill_manual(values = colorblind_vector(2))
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace
the existing scale.

Rplot13

Rplot15

Any help please to understand this?

ncborcherding commented 3 years ago

Hey beginner984,

My first guess your violin plot is plotting all your cells across multiple clusters

dittoPlot(immune.combined.sct, "HALLMARK_INFLAMMATORY_RESPONSE", group.by = "Condition") + 
+     scale_fill_manual(values = colorblind_vector(2))

What is the code you used for your UMAP of just Cluster 11?

Thanks, Nick

beginner984 commented 3 years ago

Apologise

My bad

Sorry any possibility to add p-value to violins ?

ncborcherding commented 3 years ago

No problem at all.

Unfortunately, the use of dittoSeq/dittoPlot() does not allow for easy calculations/addition of p-values. But I actually think that is a good idea. I am currently working on a newer version of escape with better significance testing - so this will be something I add to the to-do list as part of that.

Thanks, Nick

beginner984 commented 3 years ago

Thank you

If we are interested in a pathway not in hallmark likecellular response to type I interferon, what we should do?

ncborcherding commented 3 years ago

If your gene set, is in the molecular signature database, you can pull it using getGeneSets(), hallmark is using in the example because it is the smallest library.

Alternatively, you can define your own gene sets by just making a list.

myGS <- list(Housekeeping = c("ACTA1", "ACTN1", "GAPDH"),
  Cancer = c("TP53","BRCA2","ERBB2","MYC"))

Nick

beginner984 commented 3 years ago

Thank you

I have a list of genes


> cd.features
[[1]]
 [1] "IGHM"      "IGFBP7"    "IGLC2"     "IGKC"      "IGF1R"     "IGHD"      "IGFBP3"   
 [8] "IGSF11"    "IGF2BP2"   "IGLV10-54" "IGLC3"     "IGLC1"     "IGHA1"     "IGHGP"    
[15] "IGHG1"    

> ES.seurat <- enrichIt(obj = b, gene.sets = cd.features, groups = 1000, cores = 2)
[1] "Using sets of 1000 cells. Running 1 times."
Setting parallel calculations through a SnowParam back-end
with workers=2 and tasks=100.
Estimating ssGSEA scores for 1 gene sets.
> b <- Seurat::AddMetaData(b, ES.seurat)
> 
> dittoHeatmap(b, genes = NULL, metas = names(ES.seurat), 
+              annot.by = "Condition", 
+              fontsize = 7, 
+              cluster_cols = TRUE,
+              heatmap.colors = colors(50))
Error in seq.default(-m, m, length.out = n + 1) : 
  'from' must be a finite number
In addition: Warning messages:
1: In min(x, na.rm = T) : no non-missing arguments to min; returning Inf
2: In max(x, na.rm = T) : no non-missing arguments to max; returning -Inf
> names(ES.seurat)
[1] "t.scores."

I don't know why I get this error

ncborcherding commented 3 years ago

I think the way you are making your list, probably doesn't have any name associated with it.

so below there are no names for names(ES.seurat) - you can check this by just looking at names(ES.seurat)

dittoHeatmap(b, genes = NULL, metas = names(ES.seurat), 
              annot.by = "Condition", 
              fontsize = 7, 
              cluster_cols = TRUE,
             heatmap.colors = colors(50))

Maybe a better way is to make a gene set is to make a list object by using list() and then giving the list element a specific name - like IG, that way IG will be saved as the header of the gene set after enrichIt()

GS <- list(IG = cd.features)

I hope that helps, but let me know if you're still having issues.

beginner984 commented 3 years ago

Thank you so much for your patience

I have made my own gene set

GS <- getGeneSets(library = c("H", "C5"))

Then

ES.seurat <- enrichIt(obj = b, gene.sets = GS, groups = 1000, cores = 2)

But I get this

> dittoHeatmap(b, genes = NULL, metas = names(ES.seurat), 
+              annot.by = "Condition", 
+              fontsize = 7, 
+              cluster_cols = TRUE,
+              heatmap.colors = colors(50))
Error in `[.data.frame`(getMetas(object, names.only = FALSE), cells.use,  : 
  undefined columns selected
> 
ncborcherding commented 3 years ago

Hey beginner984,

No problem at all - happy to help.

The dittoHeatmap() will only work if you have added the enrichment results to the meta data.

GS <- getGeneSets(library = c("H", "C5"))
ES.seurat <- enrichIt(obj = b, gene.sets = GS, groups = 1000, cores = 2)
b <- AddMetaData(b, ES.seurat) #Here is the line I don't see
dittoHeatmap(b, genes = NULL, metas = names(ES.seurat), 
             annot.by = "Condition", 
              fontsize = 7, 
              cluster_cols = TRUE,
              heatmap.colors = colors(50))

Hopefully that solves the issue, but if not, please let me know.

Nick

beginner984 commented 3 years ago

Sorry

Please look at this picture

I don't know why I am observing weird things in the results like mt.percent

Screenshot 2021-08-09 at 12 29 20
ncborcherding commented 3 years ago

Hey Beginner984,

Here is the issue:

ES2 <- data.frame(nk4[[]], Idents(nk4))

This takes all of the meta data from your nk4 Seurat object, so things like mt.percent, nCount are incorporated. You can solve this by just subsetting the data frame:

ES2 <- ES2[,x:ncol(ES2)]
#x here is column that the gene enrichment starts

Also as an aside - I just noticed the vignette has this exact problem too - so I will fix that (thanks for bringing this to my attentions)

Nick

beginner984 commented 3 years ago

Thank you

Sorry, is my interpretation is right?

DimPlot(b, group.by = "Condition") + scale_color_manual(values = colorblind_vector(2))
>

Rplot32

ES2 <- data.frame(b[[]], Idents(b))
colnames(ES2)[ncol(ES2)] <- "cluster"

output <- getSignificance(ES2, group = "Condition", fit = "linear.model")

head(output[c(2,3,4),])
                                      cancer   control   AveExpr        F       P.Value
HALLMARK_UV_RESPONSE_DN            0.3671005 0.3938102 0.3703663 3817.032 1.071024e-175
HALLMARK_KRAS_SIGNALING_UP         0.2125914 0.2199578 0.2134921 3312.605 6.496458e-169
HALLMARK_INTERFERON_ALPHA_RESPONSE 0.2306306 0.2091470 0.2280038 3162.374 1.064109e-166
                                       adj.P.Val
HALLMARK_UV_RESPONSE_DN            3.052419e-174
HALLMARK_KRAS_SIGNALING_UP         1.234327e-167
HALLMARK_INTERFERON_ALPHA_RESPONSE 1.516355e-165
> 

In my understanding , for HALLMARK_UV_RESPONSE_DN , controls have higher expression of this pathway

Am I CORRECT PLEASE?

Because the difference is too low but adj_p is significant

ncborcherding commented 3 years ago

Hey Beginner984,

That is technically correct - it looks like the most significant using the limma linear model - however, I would encourage you to use the t test for now in the methods. There is a serious issue with overfitting with the linear model and tendency for everything to be significant (I've actually removed the fit="linear.model" in the current dev version). I am actively working on improving the getSignificance() function.

In my understanding , for HALLMARK_UV_RESPONSE_DN , controls have higher expression of this pathway

This is close, but a little incorrect. The controls have a higher mean enrichment for the pathway, not necessarily expression.

Nick

beginner984 commented 2 years ago

Sorry to interrupt

I am seeing people have clear heat maps of pathways between clusters like Figure 2f here https://www.thelancet.com/pdfs/journals/ebiom/PIIS2352-3964(20)30061-X.pdf

But if you look at mine you would see nothing is clear

Based on your experience, how I can a better pathway analysis?

You think my clustering is not clean? Or what you think behind my non-informative pathways

My data is PBMC of controls and some patients who are cancer free for years so I want to detect any factor in their immune system so here I am doing pathway analysis on monocytes

Rplot51

I have used a hallmark from your package

ncborcherding commented 2 years ago

Hey Beginner984,

The publication you cite is using the mean or median enrichment value for the myeloid subsets - the heatmap you currently have is enrichment for individual cells.

I have an example of this approach in a recent publication I had in figures 3 and 4. The code you can find at the very end of the CD4 and CD8 code I provided for publication.

But simplified - here is the code to help with getting the median values

#Get meta data and columns of interest
filter <- ES[,grepl("HALLMARK",colnames(ES))]
filter <- cbind(filter,ES2)
meta <- SeuratObj[[]]
meta <- merge(meta, filter, by = "row.names")

#Make a matrix of median values
heatmap <- meta[, c("newCluster", colnames(filter))]
melted <- reshape2::melt(heatmap, id.vars = c("newCluster"))
meanvalues <- melted %>%
  group_by(newCluster, variable) %>%
  summarise(median(value))
matrix <- reshape2::dcast(meanvalues, newCluster ~ variable)
rownames(matrix) <- matrix[,1]
matrix <- matrix[,-1]

#Plot new matrix
pdf("test.pdf", height=12, width=5)
pheatmap::pheatmap(t(matrix),  scale = "row", fontsize_row = 3, cluster_rows = T, cluster_cols = T)
dev.off()

Let me know if you have any other questions.

Nick

beginner984 commented 2 years ago

Hello

First and foremost I admit that my question is out of your responsibility and I am thankful anyway Following you addressed code of scRNA-seq analysis for your paper in previous comment, I need help please if possible

I have 3' multiome 10X from which I have analyzed scRNA-seq part (cancer and control)

> cancer
An object of class Seurat 
36601 features across 18338 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)
> control
An object of class Seurat 
36601 features across 8135 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)
> 

> pbmc<- merge(control, y = cancer)

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 20)

> pbmc
An object of class Seurat 
36601 features across 12812 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)
>

Rplot02

As I have cancer and control (although from the same lab and same kit), I integrated the data from SCtransform part like

ifnb.list <- SplitObject(pbmc, split.by = "Condition")
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)

immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT",
                                         anchor.features = features)
immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.method = "SCT")

immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)

> immune.combined.sct <- FindNeighbors(immune.combined.sct, dims = 1:30)
Computing nearest neighbor graph
Computing SNN
> immune.combined.sct <- FindClusters(immune.combined.sct, resolution = 0.7)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30)

> immune.combined.sct
An object of class Seurat 
51025 features across 12812 samples within 3 assays 
Active assay: integrated (3000 features, 3000 variable features)
 2 other assays present: RNA, SCT
 2 dimensional reductions calculated: pca, umap
>

Rplot03

My question please is: why my heatmap is this unclear with bad blocks? I don't know where I am doing wrong

> DefaultAssay(immune.combined.sct)
[1] "integrated"
> 

Rplot04

Please could inspect my approach to see how I can get a reasonable data because seemingly somewhere I am doing wrong but I can not detect where