satijalab / seurat-wrappers

Community-provided extensions to Seurat
GNU General Public License v3.0
294 stars 127 forks source link

Integrated Seurat object + separate loom -> Velocyto #9

Closed graddi closed 5 years ago

graddi commented 5 years ago

Hello,

I am confused about how to best prepare a Seurat object for Velocyto after integration. I have been able to run both velocyto and scvelo by taking the loom file created via standard velocyto pipeline and then filter cells / map onto the UMAP that was calculated on the integrated Seurat analysis.

However, I wonder whether merging the original integrated Seurat object with the loom file containing spliced / unspliced info would be better. But I'm not able to add that data before running Velocyto (Seurat wrapper). Could you please advice? The vignette is not informative for this specific situation.

Thank you very much for your hard word and best wishes.

mojaveazure commented 5 years ago

Hello,

The wrapper was designed to read in a velocyto-produced loom file into a Seurat object and run the velocity estimation pipeline (gene.relative.velocity.estimates) without needing to keep track of which matrix is which. We aren't able to provide much more help than technical expertise; conceptual questions should be directed to the velocyto team.

Are you having difficulty merging an integrated Seurat object with a velocity Seurat object?

graddi commented 5 years ago

Dear @mojaveazure thanks for the reply. Yes indeed I'm having issues using the wrapper with an integrated Seurat object. I took the code from this issue (https://github.com/velocyto-team/velocyto.R/issues/16#issuecomment-461067873), by @jebard

1) How can I filter the results from the Velocyto loom file to get only the cells that were in the final integrated Seurat dataset? See code below, as I get this error when I run the gene.relative.velocity.estimates function:

Warning message: In labels(cell.dist) == colnames(emat) : longer object length is not a multiple of shorter object length

2) Any alternatives you could recommended to change : to in the loom cell names (see again code below). I am using: colnames(emat) <- paste(substring(colnames(emat),11,11) <- ""), which does change the cell names but also returns the following error:

Error in dimnamesGets(x, value) : invalid dimnames given for “dgCMatrix” object

Load loom file with velocities

ldat <- read.loom.matrices("/Users/gr11/Documents/Results/Latest/Seurat/CCA_no_bf24_sug48/V3Seurat/Velocyto/merged.loom")

Get spliced and unspliced estimates

emat <- ldat$spliced nmat <- ldat$unspliced

Take UMAP embedding from the Seurat data object

emb <- FetchData(object = gr_all, vars = c('UMAP_1','UMAP_2'))

Estimate the cell-cell distances

cell.dist <- as.dist(1-armaCor(t(FetchData(object = gr_all, vars = c('UMAP_1','UMAP_2')))))

Modify cell names to make them the same as in Seurat object (this returns as error in the first substring function, but still seems to modify names anyway)

colnames(emat) <- paste(substring(colnames(emat),11,11) <- "") colnames(emat) <- paste(substring(colnames(emat),1,27),sep="") head(colnames(emat)) colnames(nmat) <- paste(substr(colnames(nmat),11,11) <- "") colnames(nmat) <- paste(substring(colnames(nmat),1,27),sep="") head(colnames(nmat))

Perform gamma fit on a top/bottom quantiles of expression magnitudes

fit.quantile <- 0.02

Main velocity estimation

rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2, kCells=10, cell.dist=cell.dist, fit.quantile=fit.quantile, n.cores=4)

mojaveazure commented 5 years ago

Sorry for the delay, what happens if you do your analysis using our wrapper code rather than trying to get data in and out of Seurat objects? See the Velocity vignette for more details.

graddi commented 5 years ago

Hello, thanks for your reply. Yes I had seen the vignette but it's only with sctransform and commands don't work with standard processing!

mojaveazure commented 5 years ago

You can use the standard preprocessing pipeline (NormalizeData, FindVariableFeatures, ScaleData) with our velocity wrapper; it was developed and tested using the standard pipeline. What commands are you running and what error are you getting?

graddi commented 5 years ago

Apologies I was not clear, I should clarify, the pipeline does work on its own. I meant that I am not able to add the spliced / unspliced / ambigous data on an already made / filtered / clustered / etc. Seurat object. Is there a way to do so?

e.g. If object gr_all is already a processed Seurat object, is there to merge the loom data processed with velocyto to the Seurat object rather than overwriting it? That would be a very useful feature if not.

ldat <- ReadVelocity(file = "/Users/gr11/Documents/Results/Latest/Seurat/CCA_no_bf24_sug48/V3Seurat/Velocyto/merged.loom") reading loom file via hdf5r... gr_all <- as.Seurat(x = ldat)

mojaveazure commented 5 years ago

You should be able to merge your Seurat objects together.

# Assume gr_all is a preexisting Seurat object
ldat <- ReadVelocity(file = '/path/to/loomfile.loom')
gr_all <- merge(gr_all, as.Seurat(ldat))

To ensure you're only working with filtered cells, you can change that last line to

gr_all <- merge(gr_all, as.Seurat(ldat)[, colnames(gr_all)])
esfandyari commented 5 years ago

@mojaveazure my question is maybe a naive reformulation of the question above:

we have the loom file and running the Seurat wrapper/velocyto.R vignette, we'll have a "splicing-based-Seurat" object, a "splicing-based-UMAP" embedding and then we calculate and project the "RNA velocity arrows" on the "splicing-based-UMAP". On the other hand, running Seurat SCT workflow on e.g. 10x genomics filtered_gene_bc_matrices, we'll have a "gene-based-Seurat" object and a "gene-based-UMAP" embedding. Is there a possibility/script to project the "RNA velocity arrows" on the "gene-based-UMAP" and not the "splicing-based-Seurat"? The question holds true for either a single object or an integrated object. Is merge a solution to this question? If so, at which step of the wrapper/velocyto.R vignette should one do the merging?

Thanks, Dena

CoderMatthias commented 5 years ago

Hi @mojaveazure,

I have the exact same issue as Dena. I tried using the merge option that you suggested, but I needed to rename the columns in the loom file to match the convention in the integrated analysis. When I merge I end up duplicating all cell names. Below is the code I used:

# Load loom file and integrated Seurat object
ldat <- ReadVelocity(file = "~/Path/To/file.loom")
serobj <- readRDS('~/Path/To/IntegratedSeruatObject.rds')

# Convert loom to Seurat object
ldat_ser <- as.Seurat(ldat)

# Rename loom object to same convention as integrated object
new_names <- gsub('Part To Remove', '', colnames(ldat_ser))
ldat_ser <- RenameCells(ldat_ser, new.names = new_names, for.merge = T)

# To confirm that we have the same cell names in our intergrated object and our loom object
intersect(colnames(serobj), colnames(ldat_ser[, colnames(serobj)])) # This works

# Merge objects, creates duplicates of all cells
merge_ser <- merge(serobj, ldat_ser[, colnames(serobj)])

Error from merge:

Warning message:
In CheckDuplicateCellNames(object.list = objects) :
  Some cell names are duplicated across objects provided. Renaming to enforce unique cell names.

The merged object has exactly twice the number of cells than the integrated object alone. Any ideas where the issue in my merge is coming from?

Thanks for all your work implementing these packages with Seurat, Matt

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.4.0        stringr_1.4.0        dplyr_0.8.3          purrr_0.3.2          readr_1.3.1         
 [6] tidyr_0.8.3          tibble_2.1.3         ggplot2_3.2.1        tidyverse_1.2.1      SeuratWrappers_0.1.0
[11] Seurat_3.1.0.9003    velocyto.R_0.6       Matrix_1.2-17       

loaded via a namespace (and not attached):
  [1] Rtsne_0.15          colorspace_1.4-1    ggridges_0.5.1      rstudioapi_0.10     leiden_0.3.1        listenv_0.7.0      
  [7] npsurv_0.4-0        remotes_2.1.0       bit64_0.9-7         ggrepel_0.8.1       RSpectra_0.15-0     lubridate_1.7.4    
 [13] xml2_1.2.2          codetools_0.2-16    splines_3.5.3       R.methodsS3_1.7.1   lsei_1.2-0          zeallot_0.1.0      
 [19] jsonlite_1.6        broom_0.5.2         ica_1.0-2           cluster_2.1.0       png_0.1-7           R.oo_1.22.0        
 [25] uwot_0.1.3          sctransform_0.2.0   BiocManager_1.30.4  compiler_3.5.3      httr_1.4.1          backports_1.1.4    
 [31] assertthat_0.2.1    lazyeval_0.2.2      cli_1.1.0           htmltools_0.3.6     tools_3.5.3         rsvd_1.0.2         
 [37] igraph_1.2.4.1      gtable_0.3.0        glue_1.3.1          RANN_2.6.1          reshape2_1.4.3      Rcpp_1.0.2         
 [43] Biobase_2.42.0      cellranger_1.1.0    vctrs_0.2.0         gdata_2.18.0        ape_5.3             nlme_3.1-141       
 [49] gbRd_0.4-11         lmtest_0.9-37       globals_0.12.4      rvest_0.3.4         irlba_2.3.3         gtools_3.8.1       
 [55] future_1.14.0       MASS_7.3-51.4       zoo_1.8-6           scales_1.0.0        pcaMethods_1.74.0   hms_0.5.1          
 [61] parallel_3.5.3      RColorBrewer_1.1-2  reticulate_1.13     pbapply_1.4-1       gridExtra_2.3       stringi_1.4.3      
 [67] caTools_1.17.1.2    BiocGenerics_0.28.0 bibtex_0.4.2        Rdpack_0.11-0       SDMTools_1.1-221.1  rlang_0.4.0        
 [73] pkgconfig_2.0.2     bitops_1.0-6        lattice_0.20-38     ROCR_1.0-7          htmlwidgets_1.3     bit_1.1-14         
 [79] cowplot_1.0.0       tidyselect_0.2.5    RcppAnnoy_0.0.12    plyr_1.8.4          magrittr_1.5        R6_2.4.0           
 [85] gplots_3.0.1.1      generics_0.0.2      withr_2.1.2         pillar_1.4.2        haven_2.1.1         mgcv_1.8-28        
 [91] fitdistrplus_1.0-14 survival_2.44-1.1   future.apply_1.3.0  tsne_0.1-3          hdf5r_1.2.0         modelr_0.1.5       
 [97] crayon_1.3.4        KernSmooth_2.23-15  plotly_4.9.0        grid_3.5.3          readxl_1.3.1        data.table_1.12.2  
[103] metap_1.1           digest_0.6.20       R.utils_2.9.0       RcppParallel_4.4.3  munsell_0.5.0       viridisLite_0.3.0  
graddi commented 5 years ago
gr_all <- merge(gr_all, as.Seurat(ldat)[, colnames(gr_all)])

Hello @mojaveazure,

Thank you for your reply. I did try what you recommend however with your command I get the following error:

> gr_all <- merge(gr_all, as.Seurat(ldat)[, colnames(gr_all)])
>   |                                                                                        |                                                                                |   0%  |                                                                                        |===========================                                                     |  33%  |                                                                                        |=====================================================                           |  67%  |                                                                                        |================================================================================| 100%
> Error in CellsByIdentities(object = object, cells = cells) : 
>   Cannot find cells provided

I assumed that was because of the difference in how cells were named, so I used the following code to fix it

> colnames(ldat$spliced) <- paste(substring(colnames(ldat$spliced),11,11) <- "_")
> colnames(ldat$spliced) <- paste(substring(colnames(ldat$spliced),1,27),sep="")
> head(colnames(ldat$spliced))
> colnames(ldat$unspliced) <- paste(substr(colnames(ldat$unspliced),11,11) <- "_")
> colnames(ldat$unspliced) <- paste(substring(colnames(ldat$unspliced),1,27),sep="")
> head(colnames(ldat$unspliced))

But it does not seem to work fully because I get this error while running the first command of each (substituting : for _). However when I do a head of the output the cell names seem correct (same as those in the gr_all Seurat object)

> colnames(ldat) <- paste(substring(colnames(ldat),11,11) <- "_")
> Error in `substring<-`(`*tmp*`, 11, 11, value = "_") : 
>   replacing substrings in a non-character object

After running the merge command again I get a different error:

> gr_all <- merge(gr_all, as.Seurat(ldat)[, colnames(gr_all)])
>   |                                                                                        |                                                                                |   0%  |                                                                                        |===========================                                                     |  33%  |                                                                                        |=====================================================                           |  67%Error: All cells in the object being added must match the cells in this object

Would you please be able to help? Thank you for your time.

mojaveazure commented 5 years ago

@graddi @esfandyari @CoderMatthias,

Whoops, my bad. I forgot how merge works on Seurat objects. For this, you're going to have to take the assays (and reductions) from one object and move them to another.

I would also highly recommend not creating multiple Seurat objects for the same set of samples, instead adding each new matrix as an assay to a preexisting Seurat object

Generic example:

loom.matrices <- ReadVelocity(file = '/path/to/loomfile.loom')
for (i in names(x = loom.matrices)) {
  seurat.object[[i]] <- CreateAssayObject(counts = loom.matrices[[i]])
}

Before doing this, you need to be sure that all expression matrices have the same cell names.

graddi commented 5 years ago

Hello @mojaveazure,

Thanks for the clarification. Any other processing required to add those assays with the filtered cells? I get an error:

for (i in names(x = ldat)) {

  • gr_all[[i]] <- CreateAssayObject(counts = ldat[[i]])
  • } Error: Cannot add a different number of cells than already present

For example I get this error

ldat_filt <- as.Seurat(ldat)[,colnames(gr_all)] | | | 0% | |=========================== | 33% | |===================================================== | 67%Error: All cells in the object being added must match the cells in this object

gr_all is the filtered and processed Seurat object. ldat the loom file with velocities

mojaveazure commented 5 years ago

Before doing this, you need to be sure that all expression matrices have the same cell names.

Each new assay being added must have the same cells (name and number)

CoderMatthias commented 5 years ago

@graddi

Here is what I did with a filtered Seurat object to add the velocyto assays:

### Load filtered Seurat object and loom file
seurat_object <- readRDS('/Location/of/seurat_object.rds')
ldat <- ReadVelocity(file = "/Location/of/loom_file.loom")

for (i in names(x = ldat)) {
  ### Store assay in a new variable
  assay <- ldat[[i]]

  ### Rename cell names in loom file to match cell names in Seurat object
  colnames(assay) <- gsub('Part of cell name to change', 'Changed part', colnames(assay))

  ### Subset to filtered cells in Seurat object
  assay <- assay[,colnames(seurat_object)]

  ### Add assay to Seurat object
  seurat_object[[i]] <- CreateAssayObject(counts = assay)
}

I can confirm it added the velocyto assays to my Seurat object, but haven't had a chance to go through the rest of the code yet.

graddi commented 5 years ago

Thank you very much @mojaveazure and @CoderMatthias, with the latest code from Matthias everything worked out to the end! I'll close the issue.

yaolutian commented 4 years ago

You can use the standard preprocessing pipeline (NormalizeData, FindVariableFeatures, ScaleData) with our velocity wrapper; it was developed and tested using the standard pipeline. What commands are you running and what error are you getting? @mojaveazure Hi, I tried the standard pipeline, while after Scaledata step, the default assay changed into RNA, and the spliced, unspliced assay seemed disappeared. can you check this, thanks!

GhobrialMoheb commented 4 years ago

@mojaveazure , thanks alot for your reply. @esfandyari I have the same question as you regarding the integrated objects.

Is there a possibility to apply what you mentioned to integrated objects ?

Lets say I have an integrated object that is composed of 5 original identities.

Is it possible, that after generating .loom files for each of the 5 from velocyto, to add the individual loom data to the integrated seurat object ?

vitamindlab commented 3 years ago

I would love a reply to @Mohebg1234 's questions

MikaQiao commented 2 years ago

@Mohebg1234 Hi,

Here is what I did:

  1. using ReadVelocity to read 5 .loom file from velocto;
  2. using as.Seurat to convert it into 5 "splicing-based-Seurat" object;
  3. using merge to merge them into an integrated "splicing-based-Seurat" object;
  4. following @CoderMatthias 's instruction, take the assays from the integrated "splicing-based-Seurat" to the "gene-based-Seurat" integrated objects.

Hope this helps you.

millerkaplanlab commented 9 months ago

@CoderMatthias I'm trying to use your code to perform this analysis. I'm a bit stuck on what to change for colnames. Where would I find this information in the loom ojbect?