smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
316 stars 31 forks source link

Are you planning on upgradding hdWGCNA to work with Seurat V5 in the near future? #174

Closed ozambadam closed 2 months ago

ozambadam commented 6 months ago

I am currently working with singlecell data and I would like to build coexpression networks, however, my labmates and I use seurat v5 for our analyzes, it would be great to use this amazing tool with our data!

smorabit commented 6 months ago

We do plan to upgrade hdWGCNA to work with Seurat v5 in the first half of 2024, but I do not currently have the bandwidth to work on it at this time. Honestly I know very little about what has changed in Seurat v5 compared to v4, and I have not tried it myself. For now please use Seurat v4.

smorabit commented 5 months ago

FYI, it seems like a lot of people are having issues related to Seurat v5 despite our suggestion to use v4. So I am now working on an update to support v5 and it should be done in a few weeks.

ozambadam commented 5 months ago

Thanks a lot, Sam! I really like your tool and I am sure it is going to be very successful.

bbimber commented 3 months ago

@smorabit: I'm not sure how much time you've had to spend on this migration, but I recently migrated a handful of my lab's R packages to be compatible with Seurat 5. Do you have a branch going with any in-progress changes? I would be happy to try to contribute code as we encounter issues. The primary issue I found in my first look at using this with Seurat 5 was their layering within Assay5 objects, which can be a problem with GetAssayData.

smorabit commented 3 months ago

@bbimber I appreciate it but I am already working on it, so no point in spending your time on it!

bbimber commented 3 months ago

@smorabit: perfectly fine with me. good luck!

Sandman-1 commented 3 months ago

Hello! Sorry to bother you guys, but I actually created a function a couple of weeks ago to make metacells with Seurat v5 assays if anyone is interested. It can work with both v3 assays as well as v5 assays with BPCells matrices as input, and I also added some parameters to provide more flexibility when deciding how many cells to aggregate from each grouping. For example, if one sample has 10-20 times as many cells as another sample, then you can probably construct metacells with more aggregated cells and thus less sparsity in the former than in the latter. Some components are probably redundant, but the function does indeed work. If you use BPCells, just make sure you load the BPCells library first. All other steps in the hdWGCNA pipeline should be fine. Hope this helps!

ConstructMetacells <- function(
  seurat_obj, name='agg', ident.group='orig.ident', mat=seurat_obj@assays$RNA@counts, BPCells = T,
  reduction='integrated.rpca.full', dims=1:100, 
  cells.use = NULL,
  assay='RNA', slot='counts', mode = 'sum',
  meta=NULL, return_metacell=T,
  k=30, max_shared = 15,
  target_metacells=200,
  max_iter=1000,
  verbose=T,
  wgcna_name=NULL
){

  if(is.null(wgcna_name)){wgcna_name <- seurat_obj@misc$active_wgcna}

  # check reduction
  if(!(reduction %in% names(seurat_obj@reductions))){
    stop(paste0("Invalid reduction (", reduction, "). Reductions in Seurat object: ", paste(names(seurat_obj@reductions), collapse=', ')))
  }

  # check assay
  if(!(assay %in% names(seurat_obj@assays))){
    stop(paste0("Invalid assay (", assay, "). Assays in Seurat object: ", paste(names(seurat_obj@assays), collapse=', ')))
  }

  # check slot
  if(!(slot %in% c('counts', 'data', 'scale.data'))){
    stop(paste0("Invalid slot (", slot, "). Valid options for slot: counts, data, scale.data "))
  }

  # subset seurat object by selected cells:
  if(!is.null(cells.use)){
    seurat_full <- seurat_obj
    seurat_obj <- seurat_obj[,cells.use]
  }

  message(paste0("Aggregating cells from grouping ", name))

  reduced_coordinates <- as.data.frame(seurat_obj@reductions[[reduction]]@cell.embeddings)
  nn_map <- FNN::knn.index(reduced_coordinates, k = (k - 1))
  row.names(nn_map) <- row.names(reduced_coordinates)
  nn_map <- cbind(nn_map, seq_len(nrow(nn_map)))
  good_choices <- seq_len(nrow(nn_map))
  choice <- sample(seq_len(length(good_choices)), size = 1,
      replace = F)
  chosen <- good_choices[choice]
  good_choices <- good_choices[good_choices != good_choices[choice]]
  it <- 0
  k2 <- k * 2
  get_shared <- function(other, this_choice) {
      k2 - length(union(cell_sample[other, ], this_choice))
  }
  while (length(good_choices) > 0 & length(chosen) < target_metacells & it < max_iter) {
      it <- it + 1
      choice <- sample(seq_len(length(good_choices)), size = 1,
          replace = F)
      new_chosen <- c(chosen, good_choices[choice])
      good_choices <- good_choices[good_choices != good_choices[choice]]
      cell_sample <- nn_map[new_chosen, ]
      others <- seq_len(nrow(cell_sample) - 1)
      this_choice <- cell_sample[nrow(cell_sample), ]
      shared <- sapply(others, get_shared, this_choice = this_choice)

      if(max(shared) <= max_shared){
          chosen <- new_chosen
      }
  }

  shared_old <- shared
  cell_sample <- nn_map[chosen, , drop = F]

  # get a list of the cell barcodes that have been merged:
  cells_merged <- apply(cell_sample, 1, function(x){
    paste0(colnames(seurat_obj)[x], collapse=',')
  })

  combs <- tryCatch(
    {combn(nrow(cell_sample), 2)},
    error = function(cond){return(NA)}
  )
  if(any(is.na(combs))){
    warning('Metacell failed')
    return(NULL)
  }

  shared <- apply(combs, 2, function(x) {
      k2 - length(unique(as.vector(cell_sample[x, ])))
  })

  if(verbose){
    message(paste0("Overlap QC metrics:\nCells per bin: ",
        k, "\nMaximum shared cells bin-bin: ", max(shared),
        "\nMean shared cells bin-bin: ", mean(shared), "\nMedian shared cells bin-bin: ",
        median(shared)))
    if (mean(shared)/k > 0.1)
        warning("On average, more than 10% of cells are shared between paired bins.")
  }

  # get original expression matrix
  exprs_old <- mat[,colnames(seurat_obj)]

  # groups of cells to combine
  mask <- sapply(seq_len(nrow(cell_sample)), function(x) seq_len(ncol(exprs_old)) %in%
      cell_sample[x, , drop = F])
  # mask <- mask[,which(shared_old <= max_shared)]
  # cell_sample <- cell_sample[which(shared_old <= max_shared),]
  mask <- Matrix::Matrix(mask)
  mask <- as(mask, "dMatrix")
  if(BPCells == T) {
    mask <- write_matrix_dir(mask, tempfile("Mask"))
  }

  # average or sum expression?
  new_exprs <- (exprs_old %*% mask)
  if(mode == 'average'){
    new_exprs <- new_exprs / k
  }
  colnames(new_exprs) <- paste0(name, '_', 1:ncol(new_exprs))
  rownames(cell_sample) <- paste0(name, '_', 1:ncol(new_exprs))
  colnames(cell_sample) <- paste0('knn_', 1:ncol(cell_sample))

  if(BPCells == T){
    new_exprs <- write_matrix_dir(new_exprs, tempfile("new_exprs"))
  }

  message(paste0("Creating Seurat object with ", ncol(new_exprs), " metacells"))

  # make seurat obj:
  metacell_obj <- CreateSeuratObject(
    counts = new_exprs,
    assay = assay,
    min.cells = 0,
    min.features = 0
  )
  if(slot == 'scale.data'){
    metacell_obj <- SeuratObject::SetAssayData(
      metacell_obj,
      slot=slot,
      assay=assay,
      new.data=as.matrix(new_exprs)
    )
  }

  # add the cells merged info to the metacell obj
  metacell_obj$cells_merged <- as.character(cells_merged)

  # calculate stats:
  # shared <- shared[shared <= max_shared]
  max_shared <- max(shared)
  median_shared <- median(shared)
  mean_shared <- mean(shared)

  # calculate matrix density:
  new_exprs_mem <- write_matrix_memory(new_exprs) %>% as.matrix()
  new_exprs_mem[new_exprs_mem > 0] <- 1
  density <- sum(Matrix::colSums(new_exprs_mem) / (nrow(new_exprs_mem)*ncol(new_exprs_mem)))
  run_stats <- data.frame(
    name = name,
    max_shared = max_shared,
    mean_shared = mean_shared,
    median_shared = median_shared,
    density = density,
    n = ncol(new_exprs)
  )
  rm(new_exprs_mem)

  # add to metacell seurat obj
  metacell_obj@misc$run_stats <- run_stats

  # add meta-data:
  if(!is.null(meta)){
    meta_names <- names(meta)
    for(x in meta_names){
      metacell_obj@meta.data[[x]] <- meta[[x]]
    }
  } else(
    warning('meta not found')
  )

  # add seurat metacell object to the main seurat object:
  if(return_metacell){
    out <- metacell_obj; gc()
  } else{

    # revert to full seurat object if we subsetted earlier
    if(!is.null(cells.use)){
      seurat_obj <- seurat_full
    }

    # add seurat metacell object to the main seurat object:
    seurat_obj <- SetMetacellObject(seurat_obj, metacell_obj, wgcna_name)

    # add other info
    seurat_obj <- SetWGCNAParams(
      seurat_obj, params = list(
        'metacell_k' = k,
        'metacell_reduction' = reduction,
        'metacell_slot' = slot,
        'metacell_assay' = assay
      ),
      wgcna_name
    )
    out <- seurat_obj; gc()
  }
  out
}

MetacellsByGroups <- function(
  seurat_obj, group.by=c('Cell_Type', 'orig.ident'), ident.group='Cell_Type',
  mat=seurat_obj@assays$RNA@counts, BPCells = T, join = T, matrix_save_path = "BPCells/Metacells", conv_2_sparse = T, 
  reduction='integrated.rpca.full', dims=1:100,
  cells.use = NULL, min_cells=100,
  assay="RNA", slot='counts', mode = 'sum',
  k=30, k_max = 40, k_min = 20, max_shared_ratio = 0.25,
  target_metacells=200, max_iter=1000, verbose=T, wgcna_name=NULL
){

  if(is.null(wgcna_name)){wgcna_name <- seurat_obj@misc$active_wgcna}

  # check group.by for invalid characters:
  if(any(grepl('#', group.by))){
    stop('Invalid character # found in group.by, please re-name the group.')
  }

  # check ident.group
  if(!(ident.group %in% group.by)){
    stop('ident.group must be in group.by')
  }

  # check mode
  if(!(mode %in% c('sum', 'average'))){
    stop('Invalid choice for mode. Mode can be either sum or average.')
  }

  # check reduction
  if(!(reduction %in% names(seurat_obj@reductions))){
    stop(paste0("Invalid reduction (", reduction, "). Reductions in Seurat object: ", paste(names(seurat_obj@reductions), collapse=', ')))
  }

  # check assay:
  if(is.null(assay)){
    assay <- DefaultAssay(seurat_obj)
  } else if(!(assay %in% names(seurat_obj@assays))){
    stop(paste0('Assay ', assay, ' not found in seurat_obj. Select a valid assay: ', paste0(names(seurat_obj@assays), collapse = ', ')))
  }

  # check slot:
  if(!(slot %in% c('counts', 'data', 'scale.data'))){
    stop('Invalid input for slot. Valid choices are counts, data, scale.data.')
  } else{

    # check the shape of the slot
    slot_dim <- dim(GetAssayData(seurat_obj, assay=assay, slot=slot))
    if(any(slot_dim) == 0){
      stop(paste(c("Selected slot ", slot, " not found in this assay.")))
    }
  }

  # check that k > min_cells 
  if(min_cells < k ){
    warning("min_cells is smaller than k, this may result in downstream errors if very small groups are allowed.")
  }

  # subset seurat object by selected cells:
  if(!is.null(cells.use)){
    seurat_full <- seurat_obj
    seurat_obj <- seurat_obj[,cells.use]
  }

  # setup grouping variables
  if(length(group.by) > 1){
    seurat_meta <- seurat_obj@meta.data[,group.by]
    for(col in colnames(seurat_meta)){
      seurat_meta[[col]] <- as.character(seurat_meta[[col]])
    }
    seurat_obj$metacell_grouping <- apply(seurat_meta, 1, paste, collapse='#')
  } else {
    seurat_obj$metacell_grouping <- as.character(seurat_obj@meta.data[[group.by]])
  }
  groupings <- unique(seurat_obj$metacell_grouping)
  groupings <- groupings[order(groupings)]

  # remove groups that are too small:
  group_counts <- table(seurat_obj$metacell_grouping) < min_cells
  if(any(group_counts)){
    warning(paste0("Removing the following groups that did not meet min_cells: ", paste(names(group_counts)[group_counts], collapse=', ')))
  }
  groupings <- groupings[table(seurat_obj$metacell_grouping) >= min_cells]

  if(length(groupings) == 0 ){
    stop("No groups met the min_cells requirement.")
  }

  # unique meta-data for each group
  meta_df <- as.data.frame(do.call(rbind, strsplit(groupings, '#')))
  colnames(meta_df) <- group.by

  # list of meta-data to pass to each metacell seurat object
  meta_list <- lapply(1:nrow(meta_df), function(i){
    x <- list(as.character(meta_df[i,]))[[1]]
    names(x) <- colnames(meta_df)
    x
  })

  # split seurat obj by groupings
  seurat_list <- lapply(groupings, function(x){seurat_obj[,seurat_obj$metacell_grouping == x]})
  names(seurat_list) <- groupings

  # Adjust k and max_shared parameters for each sample
  k_list <- numeric()
  max_shared_list <- numeric()
  cells_per_sample <- numeric()
  mean_cells <- numeric()
  for(i in 1:length(seurat_list)) {
    cells_per_sample[i] <- ncol(seurat_list[[i]])
  }
  mean_cells <- mean(cells_per_sample)

  for(i in 1:length(cells_per_sample)) {
    k_list[i] <- ifelse(round(k*cells_per_sample[i]/mean_cells) > k_min,
                        round(k*cells_per_sample[i]/mean_cells), 
                        k_min)
    k_list[i] <- ifelse(k_list[i] > k_max, 
                        k_max, 
                        k_list[i])
    max_shared_list[i] <- round(k_list[i]*max_shared_ratio)
  }

  # Adjust target_metacell and max_iter parameters for each sample
  target_metacells_list <- numeric()
  max_iter_list <- numeric()
  for(i in 1:length(cells_per_sample)) {
    if(target_metacells < 1000) {
      target_metacells_list[i] <- max(target_metacells, round((target_metacells/100)*(cells_per_sample[i]/k_list[i])))
      max_iter_list[i] <- target_metacells_list[i]*5
    } else {
      target_metacells_list[i] <- target_metacells
      max_iter_list[i] <- max_iter
    }
  }

  # construct metacells
  metacell_list <- mapply(
    ConstructMetacells,
    seurat_obj = seurat_list,
    name = groupings,
    meta = meta_list,
    k=k_list,
    max_shared=max_shared_list,
    max_iter=max_iter_list, 
    target_metacells=target_metacells_list,
    MoreArgs = list(
      reduction=reduction, 
      dims=dims,
      assay=assay, 
      slot=slot, 
      mat=mat,
      BPCells=BPCells,
      return_metacell=T, 
      mode=mode,
      verbose=verbose, 
      wgcna_name=wgcna_name
    )
  )
  names(metacell_list) <- groupings

  rm(seurat_list); gc()

  # remove NULL
  remove <- which(sapply(metacell_list, is.null))
  if(length(remove) >= 1){
    metacell_list <- metacell_list[-remove]
  }

  # get the run stats:
  run_stats <- as.data.frame(do.call(rbind, lapply(metacell_list, function(x){x@misc$run_stats})))
  rownames(run_stats) <- 1:nrow(run_stats)
  for(i in 1:length(group.by)){
    run_stats[[group.by[i]]] <- do.call(rbind, strsplit(as.character(run_stats$name), '#'))[,i]
  }

  message(paste("Merging metacells"))

  # combine metacell objects
  if(length(metacell_list) > 1){
    metacell_obj <- merge(metacell_list[[1]], metacell_list[2:length(metacell_list)])
    gc()
  } else{
    metacell_obj <- metacell_list[[1]]
    gc()
  }

  if(join == T & BPCells == T){
    message(paste("Converting count matrices"))
    # get vector of merged cells
    cells_merged <- strsplit(as.character(metacell_obj$cells_merged), ",")
    mat_merge <- as(matrix(
      data = sapply(cells_merged, 
                    function(x, mat) rowSums(mat[,x]), mat=mat), 
      nrow = nrow(metacell_obj), 
      ncol = ncol(metacell_obj), 
      dimnames = list(rownames(metacell_obj), 
                      colnames(metacell_obj))), 
      "sparseMatrix")
    gc()
    message(paste0("Writing BPCells Matrix to ", matrix_save_path))
    write_matrix_dir(mat_merge, matrix_save_path)
    mat_merge <- open_matrix_dir(matrix_save_path)
    metacell_obj[[assay]] <- CreateAssay5Object(counts = mat_merge)
    metacell_obj <- UpdateSeuratObject(metacell_obj)
    rm(cells_merged, mat_merge); gc()
  }

  if(conv_2_sparse == T){
    message(paste("Converting assay to v3 and count matrix to sparse matrix"))
    mat <- as(open_matrix_dir(matrix_save_path), Class = "sparseMatrix")
    # Make sure dimnames are okay
    rownames(mat) <- rownames(metacell_obj); colnames(mat) <- colnames(metacell_obj)
    metacell_obj[[assay]] <- CreateAssayObject(counts = mat)
    metacell_obj <- UpdateSeuratObject(metacell_obj)
  }

  # set idents for metacell object:
  Idents(metacell_obj) <- metacell_obj@meta.data[[ident.group]]

  # revert to full seurat object if we subsetted earlier
  if(!is.null(cells.use)){
    seurat_obj <- seurat_full
  }

  # add seurat metacell object to the main seurat object:
  seurat_obj <- SetMetacellObject(seurat_obj, metacell_obj, wgcna_name)
  gc()

  # add other info
  seurat_obj <- SetWGCNAParams(
    seurat_obj, params = list(
      'metacell_k' = k,
      'metacell_reduction' = reduction,
      'metacell_slot' = slot,
      'metacell_assay' = assay,
      'metacell_stats' = run_stats
    ),
    wgcna_name
  )
  seurat_obj
}

`

smorabit commented 2 months ago

The latest version of hdWGCNA now has Seurat v5 support!