statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
238 stars 27 forks source link

Monocle3 as input to tradeSeq #247

Open jiangzh-coder opened 9 months ago

jiangzh-coder commented 9 months ago

Hi All,

I am trying to use Monocle3 as input to tradeSeq. But i met following error:

sce <- tradeSeq::fitGAM(counts = counts, pseudotime = pseudotime, cellWeights = cellWeights)

Error in .assignCells(cellWeights) :

Some cells have no positive cell weights.

Full codes were as following; -----------https://statomics.github.io/tradeSeq/articles/Monocle.html

#########Extracting the pseudotimes and cell weights for tradeSeq library(tradeSeq) library(RColorBrewer) library(SingleCellExperiment)

For reproducibility

palette(brewer.pal(8, "Dark2"))

counts <- as.matrix(countMatrix)

data(celltype, package = "tradeSeq")

library(magrittr)

note the colnames(counts) must match rownames(meta)

data <- GetAssayData(scRNA66, assay = 'RNA', slot = 'counts') cell_metadata <- scRNA66@meta.data gene_annotation <- data.frame(gene_short_name = rownames(data)) rownames(gene_annotation) <- rownames(data) cds <- new_cell_data_set(expression_data = as.matrix(data), cell_metadata = cell_metadata, gene_metadata = gene_annotation)

access the gene names

rownames(cds)[1:5]

access the meta data

head(colData(cds))

access the assay data

head(assay(cds)[,1:3])

preprocess_cds does both 1) normalization and 2) preliminary dimension reduction. By default 1) gene expression count for each cell is divided by the total counts of that cell, multiplied by a scale factor, and log-transformed. This is done to address differences in sequencing depths for different cells. 2) PCA is used to calculate a lower dimensional space that will be used as the input for downstream steps like UMAP visualization.

cds <- preprocess_cds(cds, num_dim = 30)

look at the percentage of variance explained by our principal components

plot_pc_variance_explained(cds)

Note here we are using the results of PCA dimension reduction in the last step and visualizing the results in 2 dimensions.

umap.fast_sgd=FALSE and cores = 1 are needed for reproducible results

cds <- reduce_dimension(cds, preprocess_method = 'PCA', umap.fast_sgd=FALSE, cores = 1)

let's take another look at our cell data set object

cds

let's examine our new reducedDimNames slot!

head(reducedDims(cds)$UMAP)

view the cell types in our UMAP plot

plot_cells(cds, color_cells_by = "celltype", show_trajectory_graph = FALSE, group_label_size = 3, cell_size = 0.5)

Grouping cells into clusters is an important step in identifying the cell types represented in your data. Monocle uses a technique called community detection to group cells into cluster and partitions.

cds <- cluster_cells(cds, k = 20, partition_qval = 0.05)

plot_cells(cds, color_cells_by = "partition", show_trajectory_graph = FALSE, group_label_size = 3, cell_size = 0.5)

additionally we will run learn_graph to calculate trajectories on our subset!

cds <- learn_graph(cds, use_partition = FALSE, close_loop = TRUE)

now that we have subset out data, re-run our monocle workflow, and calculated

trajectories, let's see where we should set our root node!

plot_cells(cds, color_cells_by = "celltype", cell_size = 0.5, labels_per_group = 0)

cluster.before.traj <-plot_cells(cds, color_cells_by = "celltype", label_groups_by_cluster = T, group_label_size = 5) + theme(legend.position = "right")

cluster.before.traj

head(colData(cds)) table(colData(cds)$celltype)

a helper function to identify the root principal points:

get_earliest_principal_node <- function(cds, celltype="C2_immNK1"){ cell_ids <- which(colData(cds)[, "celltype"] == "C2_immNK1")

closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex closest_vertex <- as.matrix(closest_vertex[colnames(cds), ]) root_pr_nodes <- igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names (which.max(table(closest_vertex[cell_ids,]))))]

root_pr_nodes }

cds = order_cells(cds, root_pr_nodes=get_earliest_principal_node(cds))

cds <- order_cells(cds, reduction_method = "UMAP",

root_cells = colnames(cds[, clusters(cds) == 5]))

plot_cells(cds, color_cells_by = "pseudotime", label_groups_by_cluster = T, label_branch_points = T, label_roots = F, label_leaves = F)

plot pseudo time after choosing root nodes

(defined as the bottom in the cycling progenitors cell type)

plot_cells(cds, show_trajectory_graph = T, color_cells_by = "pseudotime", graph_label_size=3, cell_size = 0.5) plot_cells(cds, show_trajectory_graph = T, color_cells_by = "celltype", graph_label_size=3, cell_size = 0.5) ##############

how are our wild-type and mutant cells distributed in this graph?

plot_cells(cds_2, color_cells_by = "treat", show_trajectory_graph = F, label_cell_groups = F, cell_size = 0.5)

cds@colData$pseudotime=pseudotime(cds)

We find all the cells that are close to the starting point

cell_ids <- colnames(cds)[cds@colData$celltype == "C2_immNK1"] closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex closest_vertex <- as.matrix(closest_vertex[colnames(cds), ]) closest_vertex <- closest_vertex[cell_ids, ] closest_vertex <- as.numeric(names(which.max(table(closest_vertex)))) mst <- principal_graph(cds)$UMAP root_pr_nodes <- igraph::V(mst)$name[closest_vertex]

We compute the trajectory

cds <- order_cells(cds, root_pr_nodes = root_pr_nodes) plot_cells(cds, color_cells_by = "pseudotime")

library(magrittr)

Get the closest vertice for every cell

y_to_cells <- principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex %>% as.data.frame() y_to_cells$cells <- rownames(y_to_cells) y_to_cells$Y <- y_to_cells$V1

Get the root vertices

It is the same node as above

root <- cds@principal_graph_aux$UMAP$root_pr_nodes

Get the other endpoints

endpoints <- names(which(igraph::degree(mst) == 1)) endpoints <- endpoints[!endpoints %in% root]

For each endpoint

cellWeights <- lapply(endpoints, function(endpoint) {

We find the path between the endpoint and the root

path <- igraph::shortest_paths(mst, root, endpoint)$vpath[[1]] path <- as.character(path)

We find the cells that map along that path

df <- y_to_cells[y_to_cells$Y %in% path, ] df <- data.frame(weights = as.numeric(colnames(cds) %in% df$cells)) colnames(df) <- endpoint return(df) }) %>% do.call(what = 'cbind', args = .) %>% as.matrix() rownames(cellWeights) <- colnames(cds) pseudotime <- matrix(pseudotime(cds), ncol = ncol(cellWeights), nrow = ncol(cds), byrow = FALSE)

counts <- as.matrix(data) counts <- as.matrix(Biobase::exprs(cds)) sce <- tradeSeq::fitGAM(counts = counts, pseudotime = pseudotime, cellWeights = cellWeights)

Error in .assignCells(cellWeights) :

Some cells have no positive cell weights.

i will be greatly appreciated if you can help me!

best jiang

koenvandenberge commented 8 months ago

Hi @jiangzh-coder

Could you please check how cellWeights looks like? The error message is suggesting that some cells have either all zero or negative weights. This does not allow us to assign cells to lineages. Perhaps check how often this occurs, and where these cells may reside within your trajectory can be helpful.

mahwish2 commented 5 months ago

Hi, sorry, If I should not enter this thread, but I am having same error following the mentioned vignette. I sure have zero weights in my dataset frequently. I am wondering how to sort his out. My data looks like this Y_1 Y_2 Y_71 Y_94 Y_96
Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
Median :0.00000 Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
Mean :0.06106 Mean :0.2091 Mean :0.1103 Mean :0.04645 Mean :0.2709
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
Y_97 Y_112 Y_117 Y_193 Y_267
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.1771 Mean :0.1873 Mean :0.2127 Mean :0.1709 Mean :0.1776
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000

this is summary of cell weights

koenvandenberge commented 3 months ago

Hi @mahwish2

It is fine to have weights of zero, but the sum of the weights across all lineages should not be zero. In that case, we cannot assign a cell to any lineage.