ktrns / scrnaseq

Workflow for single-cell RNA-seq analysis using Seurat
MIT License
37 stars 15 forks source link

Single sample input #69

Closed tglomb closed 3 years ago

tglomb commented 3 years ago

How can I generate a report of a single sample?
When I'm trying to do so I get an error in the 'cells_per_cluster' chunk where 'tbl' is expected to be a matrix or data.frame with more than just one dimension.

andpet0101 commented 3 years ago

Ah, thanks for spotting the bug - I guess it was a while since we had last only one sample in the analysis. Hope that this is the only bug.

This should work.

# Count cells per cluster per sample 
cell_samples = sc[[]] %>% dplyr::pull(orig.ident) %>% levels()
cell_clusters = sc[[]] %>% dplyr::pull(seurat_clusters) %>% levels()

tbl = dplyr::count(sc[[c("orig.ident", "seurat_clusters")]], orig.ident, seurat_clusters) %>% tidyr::pivot_wider(names_from="seurat_clusters", names_prefix="Cl_", values_from=n) %>% as.data.frame()
rownames(tbl) = paste0(tbl[,"orig.ident"],"_n")
tbl[,"orig.ident"] = NULL

# Add percentages
tbl_perc = round(t(tbl) / colSums(tbl) * 100, 2) %>% t()
rownames(tbl_perc) = gsub(rownames(tbl_perc), pattern="_n$", replacement="_perc", perl=TRUE)
tbl = rbind(tbl, tbl_perc)

# Add enrichment
if (length(cell_samples) > 1) tbl = rbind(tbl, cells_fisher(sc))

# Sort
tbl = tbl[order(rownames(tbl)),]

# Plot percentages
tbl_bar = tbl[paste0(cell_samples, "_perc"),] %>% 
  tibble::rownames_to_column(var="Sample") %>%
  tidyr::pivot_longer(tidyr::starts_with("Cl"), names_to="Cluster", values_to="Percentage")
tbl_bar$Cluster = tbl_bar$Cluster %>% gsub(pattern="^Cl_", replacement="", perl=TRUE) %>% factor(levels=cell_clusters)
tbl_bar$Sample = tbl_bar$Sample %>% gsub(pattern="_perc$", replacement="", perl=TRUE) %>% factor(levels=cell_samples)
tbl_bar$Percentage = as.numeric(tbl_bar$Percentage)
p = ggplot(tbl_bar, aes(x=Cluster, y=Percentage, fill=Sample)) + 
  geom_bar(stat="identity" ) +
  AddStyle(title="Percentage cells of samples in clusters",
           fill=param$col_samples,
           legend_title="Sample",
           legend_position="bottom")
p

Andreas

andpet0101 commented 3 years ago

I should mention that this only works for the single sample case. For multiple samples the first three lines should be:

# Count cells per cluster per sample 
cell_samples = sc[[]] %>% dplyr::pull(orig.ident) %>% unique()
cell_clusters = sc[[]] %>% dplyr::pull(seurat_clusters) %>% levels()

We will fix this in the next commits.

tglomb commented 3 years ago

Thx @andpet0101. I will give it a try. 😉

tglomb commented 3 years ago

Thank you @andpet0101. That did the job.