Open nehawali21 opened 7 months ago
Hi,
Does the underlying raw, normalize, or scale data have any NAs present? Scaling occurs within gene across identity used. flip
parameter is simply visualization change.
Best, Sam
I've tried Seurat::GetAssayData
on the Seurat object to extract the counts
, data
, and scale.data
matrices for my genes of interest. However, I'm not getting it to subset to my features properly and I'm only getting the larger sparse matrices for all features. Nevertheless, I imagine NAs are being introduced in scaling for these genes in Clustered_DotPlot
, as I can plot these genes with Seurat::DotPlot
and scale = TRUE
. Is Clustered_DotPlot
not scaling the same way as DotPlot
perhaps?
Hence, I'm wondering how one could extract the underlying data used in Clustered_DotPlot
to see which genes are playing spoilsport and remove them from the query.
So the weird thing is that I do have an NA check in the function so not sure how this is slipping past this. Also for unsplit ClusteredDotPlots the data actually gets pulled directly from Seurat's DotPlot function.
Can you run the following as a test and post the output?
data <- FetchData(OBJ_NAME, vars = GENE_LIST)
colnames(data)[ apply(data, 2, anyNA) ]
Thanks! Sam
Also can you confirm what version of scCustomize you have installed?
Sure thing. I have scCustomize version 2.1.2 and Seurat version 5.0.3.
After putting in the suggested code:
data <- FetchData(seurat, vars = features)
colnames(data)[ apply(data, 2, anyNA) ]
this is the only output:
character(0)
Ok working through issues here.
did you scale the data yourself or was it scaled in the downloaded object? Does scaling it yourself fix issue?
Best, Sam
Thank you so much for kindly continuing to look into this.
The object had raw reads so scaling is coming from me with the Seurat workflow.
What happens if you run the FetchData code block but specify the scale data
data <- FetchData(seurat, vars = features, layer = "scale.data")
colnames(data)[ apply(data, 2, anyNA) ]
I get character(0)
at the end of this too.
I also get warning messages that genes can't be found when using scale.data
, probably because they aren't all within the 4000 HVGs I had specified for Seurat::FindVariableFeatures
. Still, Seurat::DotPlot
and scale = TRUE
can generate a plot, so I'm not sure what's going on.
Ok we'll figure it out lol.
Can you post the full code you ran after downloading the object? I'll try and recreate on my end.
Best, Sam
Sure. The things I read in were downloaded from these publicly available sources:
raw_reads.rds
and umap.rds
are from https://github.com/ScialdoneLab/human-gastrula-shiny
E-MTAB-9388.sdrf.txt
is from https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-9388
I've just put a generic [FILEPATH] here as it would need to change, and I've condensed the code for ease, including a smaller list of features which still yields issues.
# packages
if (!require("pacman", quietly = TRUE)) install.packages("pacman"); library(pacman)
pacman::p_load(Seurat, tidyverse, scCustomize, janitor)
pacman::p_load_gh("satijalab/seurat-wrappers")
# read in things
raw_reads_gastrula <- readRDS("[FILEPATH]\\raw_reads.rds") %>% t() %>% as.data.frame()
umap_metadata_gastrula <- readRDS("[FILEPATH]\\umap.rds")
arrayexpress_metadata_gastrula <- read.delim(file = "[FILEPATH]\\E-MTAB-9388.sdrf.txt") %>% clean_names() %>% dplyr::select(-c(scan_name, comment_submitted_file_name, comment_fastq_uri)) %>% distinct()
# merge metadata tables into one to use for seurat object, first clean up cell names in one to match
# ensure columns are equivalent so that either one can be used as cell name for merging
table(arrayexpress_metadata_gastrula$source_name == arrayexpress_metadata_gastrula$factor_value_single_cell_identifier) # should be all 1195 TRUE
# duplicate cell names and as new col and change naming to match cell naming in umap metadata
arrayexpress_metadata_gastrula <- arrayexpress_metadata_gastrula %>% mutate(cell_name_merge = factor_value_single_cell_identifier) %>% mutate(cell_name_merge = gsub("_", "\\.", cell_name_merge))
# merge arrayexpress and umap metadata into one df for seurat object
full_metadata_gastrula <- full_join(umap_metadata_gastrula, arrayexpress_metadata_gastrula, by = c("cell_name" = "cell_name_merge"))
# create seurat object for next analyses
seurat_gastrula <- CreateSeuratObject(counts = raw_reads_gastrula, meta.data = full_metadata_gastrula, project = "gastrula", min.cells = 0, min.features = 0)
# set cell identities to already annotated clusters info in metadata
Idents(seurat_gastrula) <- "cluster_id"
# don't need to subset based on QC because cells already processed before, just go to normalize
# QC - mt genes, etc.
seurat_gastrula[["percent.mt"]] <- PercentageFeatureSet(seurat_gastrula, pattern = "^MT\\.|^MTATP|^MTCO|^MTCYBP")
# seurat_gastrula <- subset(seurat_gastrula,
# subset = nFeature_RNA > 2000 &
# percent.mt < 2
# )
# normalize
set.seed(1212)
seurat_gastrula <- NormalizeData(seurat_gastrula, normalization.method = "LogNormalize", scale.factor = 10000)
# find variable features
set.seed(1212)
seurat_gastrula <- FindVariableFeatures(seurat_gastrula, selection.method = "vst", nfeatures = 4000)
# scale data
set.seed(1212)
seurat_gastrula <- ScaleData(seurat_gastrula)
# PCA
set.seed(1212)
seurat_gastrula <- RunPCA(seurat_gastrula, npcs = 50)
# cell neighbors and clusters
set.seed(1212)
seurat_gastrula <- FindNeighbors(seurat_gastrula, dims = 1:30, k.param = 50)
set.seed(1212)
seurat_gastrula <- FindClusters(seurat_gastrula, dims.use = 1:30, k.param = 50, resolution = 0.75, algorithm = 1)
# UMAP
set.seed(1212)
seurat_gastrula <- RunUMAP(seurat_gastrula, reduction = "pca", dims = 1:30)
# set cell identities to already annotated clusters info in metadata
Idents(seurat_gastrula) <- "cluster_id"
# create features list
features <- c("SCN2A" , "PIDD1" ,"RBM38" , "FDXR" , "DCP1B", "EPHA2", "AK3" , "RNASE7", "ZNF385A", "CPEB2")
And here is the Seurat::DotPlot
code which generates a scaled plot:
set.seed(1212)
DotPlot(seurat_gastrula,
cluster.idents = TRUE,
features = features) +
scale_colour_gradient2(low = "red",
mid = "lightgrey",
high = "blue") +
xlab("Gene") +
ylab("Cluster") +
scale_y_discrete(limits = rev) +
theme(axis.text = element_text(color = "black",
size = 14),
axis.text.x = element_text(hjust = 1,
angle = 45,
face = "italic"),
axis.title = element_text(color = "black",
face = "bold",
size = 14),
axis.title.x = element_text(margin = margin(10,0,0,0,"mm")),
axis.title.y = element_text(margin = margin(0,10,0,0,"mm")),
legend.title = element_text(color = "black",
face = "bold",
size = 14),
legend.text = element_text(color = "black",
size = 14))
The Clustered_DotPlot
code is the same as before (just with seurat_gastrula
as the Seurat object rather than generic seurat
).
Ok, so I figured out the issue and will work on scCustomize'd based solution but for now there is easy solution that you can do on your end.
The reason for the error is because one of the features you are trying to plot has no expression in the object ("RNASE7"). Because you created the object with min.cells = 0
the object contains all features including genes that have no expression. So for now you can just change that to min.cells = 1
which will only include genes that are found in at least one cell. Then plotting has no issues.
Best, Sam
I see, thank you so much for kindly figuring this out! I can make the Clustered_DotPlot
now.
Also, is it possible to move the legends up to the top right and make them 1 column, or remove the identity color annotations on the heatmap and in the legend, in Clustered_DotPlot
?
Thank you so much for creating this package; it's been revolutionary for spicing up the standard Seurat figures! As a novice bioinformatician, I'm afraid I've run into a small issue that I hope I could kindly get some help with.
I'm trying to create a clustered dotplot of about 200 genes' expression across identities after Seurat analysis of a small published scRNAseq dataset.
However, inputting my genes of interest yields the following error, which I believe is due to scaling introducing issues in the matrix used to make the dotplot:
Below are the parameters I'm using in the
Clustered_DotPlot
function:I'm not sure how to access the underlying matrix the
Clustered_DotPlot
function uses so that I could examine and exclude genes causing the issue. I'd be most grateful on what I can do to generate a graph.Also, can you kindly clarify which direction the scaling occurs in, i.e. by row (originally a gene's expression across clusters) or by column (originally identities)? That is, will the
flip
parameter retain the original scaling (just now transposed numbers) as it only flips axes at the end to generate the plot?