Seurat2 with Veloctyo.R

Jemkon commented 6 years ago

Hello,

I have tested the velocyto.R package with Pagoda2 as described in tutorials. But I would really like to test it with Seurat2. How do I create cell embedding, cell clustering, and accurate cell-cell distance matrix using Seurat2? Many thanks.

pkharchenko commented 6 years ago

This is more of a question for the Seurat list, but we'll try to add an example when we update tutorials.

Puriney commented 6 years ago

I happened to be a fan of both packages. You would find the answers at the Seurat2's pbmc3K tutorial.

For cell clustering, see the section "Cluster the cells". The function is FindClusters.

For cell embedding, always run RunPCA first and then RunTSNE as the RunTSNE internally relies on the existence of PCA slot. The actual coordinates of the embedding are fetched by seurat_obj@dr$pca@cell.embeddings and seurat_obj@dr$tsne@cell.embeddings for PCA and tSNE, respectively. With the embeddings, you can easily follow velocyto's tutorial to mask the velocity.

For the better cell distance, in principle, you could adapt the below code from velocyto tutorial to Seurat2 object by replacing the r$reductions$PCA with the Seurat2's PCA coordinates. (I did not actually test this suggestion)

cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))

ramonvidal commented 6 years ago

It worked very well for me using Seurat object in velocyto functions, I just needed to rename the emat and nmat colnames in order to fit with the ones in Seurat Object:

emat <- emat[,rownames(pbmc@meta.data)]; nmat <- nmat[,rownames(pbmc@meta.data)]; 
cluster.label <- pbmc@ident
cell.colors <- pagoda2:::fac2col(cluster.label)
emb <- pbmc@dr$tsne@cell.embeddings
cell.dist <- as.dist(1-armaCor(t(pbmc@dr$cca@cell.embeddings)))

ramonvidal commented 6 years ago

Hi @jemkon,

For Seurat you need to use a single matrix, as far as I understand the dat object contains 3 different matrices (spliced, ambiguous, unspliced). The function to create a seurat object is very simple:

CreateSeuratObject(raw.data, project = "SeuratProject", min.cells = 0, min.genes = 0, normalization.method = NULL, scale.factor = 10000, do.scale = FALSE, do.center = FALSE, names.field = 1, names.delim = "_", meta.data = NULL, display.progress = TRUE, ...)

But you need a raw.data in a single matrix format with cells in columns and gene in rows.

I dont know what you are trying to do, but I was not happy with the combination of my seurat results with velocyto. I ended up using pagoda + velocyto pipeline and matching the t-SNE clusters later by looking at the gene markers.

Best, Ramon

On Wed, 28 Mar 2018 at 11:06 Jemkon notifications@github.com wrote:

Hi @ramonvidal https://github.com/ramonvidal,

Thanks for sharing your code for seurat. I was wondering how do I use the "cell.counts.matrices.rds" file (created by dropEST) to create seurat object?

I tried following code but failed....
  dat <- readRDS("/home/10X_CellRnager/velocyto/cell.counts.matrices.rds")
  pbmc <- NormalizeData(object = dat, normalization.method = "LogNormalize",
                  scale.factor = 10000)
Could you please share more info on creating seurat object? Many thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/velocyto-team/velocyto.R/issues/16#issuecomment-376815655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG_gt5KGaD2OtnGbkdpjRHSrc1zWMllks5ti1KvgaJpZM4SD2VK .

-- Ramon Vidal +4917621123210 <+49%20176%2021123210>

Jemkon commented 6 years ago

Thanks @ramonvidal.

I was trying to compare and see how pagoda and seurat performs with velocyto. I found the same. The Pagoda2 + velocyto works best compared to Seurat2 + velocyto.

Many thanks for your suggestions.

charles-bernard commented 6 years ago

I would rahter vote for Seurat2 + velocyto simply because Seurat enables "Diffusion Map" ( See RunDiffusion function ) as a method of dimensionality reduction while Pagoda2 doesn't . I find diffusion maps better at capturing the dynamics of cell trajectories than t-SNE. Therefore, I would most of the time prefer to use a diffusion map embedding ( GetCellEmbeddings(object, reduction.type = "dm") ).

LineWulff commented 6 years ago

Could someone please explain to me how/where I get the emat (spliced count matrix) and nmat (unspliced count matrix) from if I am using a seurat object? I am trying to use velocyto.R to confirm some of my results.

jebard commented 6 years ago

@LineWulff - You will need to run the Velocyto command line tools on the dataset first to get the splicing information. That will create the loom file which you can get the emat and nmat. Then you can use the tSNE embeddings from Seurat, with the emat and nmat from the loom file to make your plot... it's a bit tricky.

x811zou commented 5 years ago

@ramonvidal How did you make your pbmcY1.rds?

ramonvidal commented 5 years ago

@ramonvidal How did you make your pbmcY1.rds?

Hi @x811zou , that is a typo, pbmcY1 == pbmc

chinalex9527 commented 5 years ago

@ramonvidal , I was able to create cell embedding and clustering using seurat, but when I rename the emat and nmat colnames by running emat <- emat[,rownames(Mouse1stArch12@meta.data)]; nmat <- nmat[,rownames(Mouse1stArch12@meta.data)]; It gave me this :
Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) : invalid character indexing Do you have any suggestions? Thank you in advance!

chlee-tabin commented 5 years ago

I may not be following, but "renaming the emat and nmat colnames" should be

colnames(emat) <- rownames(Mouse1stArch12@meta.data)
colnames(nmat) <- rownames(Mouse1stArch12@meta.data)

chinalex9527 commented 5 years ago

@chlee-tabin , sorry I'm completely new to R scripts, I guess my question is in velocity estimation, they used Pagoda2 object for the following command: emat <- emat[,rownames(r$counts)]; nmat <- nmat[,rownames(r$counts)] How can I modify it with the Seurat object I created? Thank you!

LineWulff commented 5 years ago

@chinalex9527 , with a Seurat object I did: emat <- emat[,rownames(seuratobject@meta.data)]; nmat <- nmat[,rownames(seuratobject@meta.data)] This step will filter out cells from emat and nmat, which are not in the seurat/pagoda object. I had to rename my emat and nmat columns first though. You have to inspect how the cells are named in both your seurat object (e.g. rownames(seuratobject@meta.data)) and the nmat/emat format.

jebard commented 5 years ago

Here is my full code for making this work -- hope this helps you all

#library(devtools)
#install_github("velocyto-team/velocyto.R")
library(velocyto.R)
library(pagoda2)
library(Seurat)

# This is generated from the Velocyto python command line tool.
# You need a loom file before you can proceed
ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom")

# Gather the spliced and unspliced estimates
emat <- ldat$spliced
nmat <- ldat$unspliced

# take embedding from the Seurat data object
# NOTE: This assumes you have a seurat data object loaded
# into R memory prior to using this script. STOP and rerun seurat
# pipeline if you do not have this loaded. In my case, my seurat object is simply myData
emb <- myData@dr$tsne@cell.embeddings

# Estimate the cell-cell distances 
cell.dist <- as.dist(1-armaCor(t(myData@dr$tsne@cell.embeddings)))

# This is a little bit of foo-magic that needs to be adjusted on a per-sample
# basis depending on the cell names and how you ran the pipeline. Each cell
# stored in the loom object and seurat have an ID, make sure these are the same.
# in this case -- i need to trim 28 characters off the front to match seurat object.
colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="")

# What this step does is essentially this:
# > head(colnames(emat))
# [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx"
# [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx"
# [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx"

# > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
# > head(colnames(emat))
# [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG" 

# Now the names in emat and nmat will match up to the cell names used in my seurat object

# I'm not sure what this parameter does to be honest. 0.02 default
# perform gamma fit on a top/bottom quantiles of expression magnitudes
fit.quantile <- 0.02

# Main velocity estimation
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
                                            kCells=10,
                                            cell.dist=cell.dist,
                                            fit.quantile=fit.quantile,
                                            n.cores=24)

# This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme.
gg <- TSNEPlot(myData)
ggplot_build(gg)$data
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)

 p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
                                      cell.colors=ac(colors,alpha=0.5),
                                      cex=0.8,arrow.scale=2,show.grid.flow=T,
                                      min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
                                      do.par=F,cell.border.alpha = 0.1,
                                      n.cores=24,main="Cell Velocity")#,cc=p1$cc)

chinalex9527 commented 5 years ago

@jebard , thank you so much! The code worked beautifully!

CodeInTheSkies commented 5 years ago

@jebard , Many thanks for your full code!! Very helpful, since applying RNA velocity does not seem to be straightforward, and a clear pipeline to use it in conjunction with Seurat is missing from the creators, as far as I know.

It will be great if you could please clarify two things:

1) Towards the end of your code when you try to get the color out of the Seurat TSNE object, the code you gave above doesn't seem to work anymore. I'm using Seurat v2.3.4, R v3.5.0 and ggplot v2_3.1.0. Maybe the slots have changed? I spent some time to explore the slots to find out where the color info is stored, but I haven't had luck so far. Can you please help if you know?

2) Once we successfully run show.velocity.on.embedding.cor, how exactly should we make the plot? Is the output p1 a ggplot object? Can we use print() pdf() or cowplot::save_plot with it?

Many thanks again, in advance!

CodeInTheSkies commented 5 years ago

Further, the output of show.velocity.on.embedding.cor is a list. So, basically, I would like to know how best to convert this output to a ggplot, gtable, or a grob.

Thank you!

jebard commented 5 years ago

@CodeInTheSkies

Went back to the code and was able to get the colors out using this:

gg <- TSNEPlot(myData) colors <- as.list(ggplot_build(gg)$data[[1]]$colour) names(colors) <- rownames(emb)

I'm not sure why originally i had it as ggplot$Lpo -- i'll update my code above accordingly.

The output of the command show.velocity.on.embedding.cor should result in a plot being generated. I use Rstudio so it automatically pops up a plot. How I typically export to a file is by:

png("~/my-exported-graph.png") show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt', cell.colors=ac(colors,alpha=0.5), cex=0.8,arrow.scale=2,show.grid.flow=T, min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1, do.par=F,cell.border.alpha = 0.1, n.cores=24,main="Cell Velocity") dev.off()

The reason I have been saving it into the p1 object, is that it can help speed up replotting the figure should you need to make any graphical changes (like arrow size / density) -- to do this, you pass the parameter cc=p1$cc

R version 3.4.2 Seurat_2.3.4 cowplot_0.9.2 ggplot2_3.0.0 velocyto.R_0.6

MichaelPeibo commented 5 years ago

Hi @jebard

Thanks for very detailed demonstration! Just one question related to this. Do I need filter.genes.by.cluster.expression before gene.relative.velocity.estimates? I saw this step in this tutorial

pkharchenko commented 5 years ago

It’s not strictly necessary to select “cluster-distinguishing genes - it’s a recommended step. If different clusters in your dataset represent different points in the dynamic process, this will likely result in an informative gene set.

On Apr 23, 2019, at 3:36 AM, MichaelPeibo notifications@github.com wrote:

Hi @jebard https://github.com/jebard Thanks for very detailed demonstration! Just one question related to this. Do I need filter.genes.by.cluster.expression before gene.relative.velocity.estimates? I saw this step in this tutorial http://pklab.med.harvard.edu/velocyto/notebooks/R/DG1.nb.html — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/velocyto-team/velocyto.R/issues/16#issuecomment-485679760, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2PX4XJZ5W6FDKBVR7ISGTPR237VANCNFSM4EQPMVFA.

MichaelPeibo commented 5 years ago

Hi @pkharchenko Thanks for response! I will try to do filter and without to to if there is a difference.

Fu-Luo commented 5 years ago

Hi @jebard

Thanks for your code. I was not sure about this command: emb <- myData@dr$tsne@cell.embeddings

What exactly is dr?

When I was trying to run this command:

emb <- lrc8@dr$tsne@cell.embeddings Error: no slot of name "dr" for this object of class "Seurat"

MichaelPeibo commented 5 years ago

Hi @MazzzzzeLuo This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.

Also make sure you are using Seuratv2 rather than v3.

GBeattie commented 5 years ago

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.

Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.

> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

ArcusGears commented 5 years ago

I'm having some trouble with transferring some of the commands over.
When I try to change the cluster label from the pagoda2 imput to a seurat one I keep throwing an error.

cluster.label <- object@ident
    Error: no slot of name "ident" for this object of class "Seurat"

Shouldn't ident be automatically created when the cluster neighbors are generated? When I ran a DimPlot for UMAP I was able to group it by "ident" as well, so I'm not sure what the problem is.

Also like @GBeattie I am trying to use Seurat v3 rather than v2

XYZuo commented 4 years ago

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot. Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.
> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

hI， Have you solven this problem? I met the same error using Seurat 3. Thanks!

Bioinformatics-rookie commented 4 years ago

Here is my full code for making this work -- hope this helps you all

#library(devtools)
#install_github("velocyto-team/velocyto.R")
library(velocyto.R)
library(pagoda2)
library(Seurat)

# This is generated from the Velocyto python command line tool.
# You need a loom file before you can proceed
ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom")

# Gather the spliced and unspliced estimates
emat <- ldat$spliced
nmat <- ldat$unspliced

# take embedding from the Seurat data object
# NOTE: This assumes you have a seurat data object loaded
# into R memory prior to using this script. STOP and rerun seurat
# pipeline if you do not have this loaded. In my case, my seurat object is simply myData
emb <- myData@dr$tsne@cell.embeddings

# Estimate the cell-cell distances 
cell.dist <- as.dist(1-armaCor(t(myData@dr$tsne@cell.embeddings)))

# This is a little bit of foo-magic that needs to be adjusted on a per-sample
# basis depending on the cell names and how you ran the pipeline. Each cell
# stored in the loom object and seurat have an ID, make sure these are the same.
# in this case -- i need to trim 28 characters off the front to match seurat object.
colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="")

# What this step does is essentially this:
# > head(colnames(emat))
# [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx"
# [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx"
# [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx"

# > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
# > head(colnames(emat))
# [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG" 

# Now the names in emat and nmat will match up to the cell names used in my seurat object

# I'm not sure what this parameter does to be honest. 0.02 default
# perform gamma fit on a top/bottom quantiles of expression magnitudes
fit.quantile <- 0.02

# Main velocity estimation
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
                                            kCells=10,
                                            cell.dist=cell.dist,
                                            fit.quantile=fit.quantile,
                                            n.cores=24)

# This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme.
gg <- TSNEPlot(myData)
ggplot_build(gg)$data
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)

 p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
                                      cell.colors=ac(colors,alpha=0.5),
                                      cex=0.8,arrow.scale=2,show.grid.flow=T,
                                      min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
                                      do.par=F,cell.border.alpha = 0.1,
                                      n.cores=24,main="Cell Velocity")#,cc=p1$cc)

I ran follow your code, and the last step went wrong, Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option how can I slove it?

XYZuo commented 4 years ago

Hi @Bioinformatics-rookie , I met the same problem before and solved it by installing libopenblas. I use anaconda so I ran conda install -c conda-forge libopenblas Hope it can help you.

Battamama commented 3 years ago

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot. Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.
> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length
I have the same problem too. Have you guys solved it?

yanpinlu commented 2 years ago

Hello, I have encountered this problem, how can I solve it, I still do not understand sheild.seurat <- readRDS("/home/sheild.nodoublet7.16.rds") sheild$spliced<-sheild$spliced[,rownames(sheild.seurat@meta.data)] Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) : invalid character indexing @ramonvidal

When you run three more samples separately, one of them will have such a problem at this step, and the other two will work correctly. I found that the number of rows of the seurat object with the error sample did not match the number of row with spliced, while the other two samples matched

velocyto-team / velocyto.R

Seurat2 with Veloctyo.R #16