mojaveazure / seurat-disk

Interfaces for HDF5-based Single Cell File Formats
https://mojaveazure.github.io/seurat-disk
GNU General Public License v3.0
156 stars 50 forks source link

Efficiently import gene expression values for a selected gene. #180

Open le-raman opened 8 months ago

le-raman commented 8 months ago

Hi devs,

Thanks for this tool, it's been helping me a lot!

I'm building an interactive single-cell data browser and I want to extract the expression values of the RNA assay of a particular gene. I can't transform to a Seurat object, as is this slows down the tool too much, so want to use the connection established by SeuratDisk::Connect.

It's quite easy to extract all expression values of a particular cell, using something like:

get_cell_expr_h5fd <- function(reactive, cell) {
  # indptr matches length cells + 1, so end_i always + 1 possible :)

  indptr <- reactive$seu[["assays"]][["RNA"]][["data"]][["indptr"]]
  indices <- reactive$seu[["assays"]][["RNA"]][["data"]][["indices"]]
  data <- reactive$seu[["assays"]][["RNA"]][["data"]][["data"]]

  cell_index <- which(reactive$seu_cell == cell)

  # plus one to convert to R-style indexing
  var_indices <- indices$read(list(start_i:end_i)) + 1

  # plus one to convert to R-style indexing
  start_i <- indptr$read(cell_index) + 1

  # not plus one to convert to R-style indexing (R is open interval at end)
  end_i <- indptr$read(cell_index + 1)

  # plus one to convert to R-style indexing
  var_indices <- indices$read(list(start_i:end_i)) + 1

  expr_values <- rep(0, length(reactive$seu_vars))
  names(expr_values) <- reactive$seu_vars
  expr_values[var_indices] <- data$read(list(start_i:end_i))

  return(expr_values)
}

This is super fast, almost irrespective of number of cells.

But I can't seem to understand how I would do the same for a particular gene. Is there a native function for this that I missed?

Thanks!

Best