Open Jemkon opened 6 years ago
This is more of a question for the Seurat list, but we'll try to add an example when we update tutorials.
I happened to be a fan of both packages. You would find the answers at the Seurat2's pbmc3K tutorial.
For cell clustering, see the section "Cluster the cells". The function is FindClusters
.
For cell embedding, always run RunPCA
first and then RunTSNE
as the RunTSNE
internally relies on the existence of PCA slot. The actual coordinates of the embedding are fetched by seurat_obj@dr$pca@cell.embeddings
and seurat_obj@dr$tsne@cell.embeddings
for PCA and tSNE, respectively. With the embeddings, you can easily follow velocyto
's tutorial to mask the velocity.
For the better cell distance, in principle, you could adapt the below code from velocyto
tutorial to Seurat2 object by replacing the r$reductions$PCA
with the Seurat2's PCA coordinates. (I did not actually test this suggestion)
cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))
It worked very well for me using Seurat object in velocyto functions, I just needed to rename the emat and nmat colnames in order to fit with the ones in Seurat Object:
emat <- emat[,rownames(pbmc@meta.data)]; nmat <- nmat[,rownames(pbmc@meta.data)];
cluster.label <- pbmc@ident
cell.colors <- pagoda2:::fac2col(cluster.label)
emb <- pbmc@dr$tsne@cell.embeddings
cell.dist <- as.dist(1-armaCor(t(pbmc@dr$cca@cell.embeddings)))
Hi @jemkon,
For Seurat you need to use a single matrix, as far as I understand the dat object contains 3 different matrices (spliced, ambiguous, unspliced). The function to create a seurat object is very simple:
CreateSeuratObject(raw.data, project = "SeuratProject", min.cells = 0, min.genes = 0, normalization.method = NULL, scale.factor = 10000, do.scale = FALSE, do.center = FALSE, names.field = 1, names.delim = "_", meta.data = NULL, display.progress = TRUE, ...)
But you need a raw.data in a single matrix format with cells in columns and gene in rows.
I dont know what you are trying to do, but I was not happy with the combination of my seurat results with velocyto. I ended up using pagoda + velocyto pipeline and matching the t-SNE clusters later by looking at the gene markers.
Best, Ramon
On Wed, 28 Mar 2018 at 11:06 Jemkon notifications@github.com wrote:
Hi @ramonvidal https://github.com/ramonvidal,
Thanks for sharing your code for seurat. I was wondering how do I use the "cell.counts.matrices.rds" file (created by dropEST) to create seurat object?
I tried following code but failed....
dat <- readRDS("/home/10X_CellRnager/velocyto/cell.counts.matrices.rds") pbmc <- NormalizeData(object = dat, normalization.method = "LogNormalize", scale.factor = 10000)
Could you please share more info on creating seurat object? Many thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/velocyto-team/velocyto.R/issues/16#issuecomment-376815655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG_gt5KGaD2OtnGbkdpjRHSrc1zWMllks5ti1KvgaJpZM4SD2VK .
-- Ramon Vidal +4917621123210 <+49%20176%2021123210>
Thanks @ramonvidal.
I was trying to compare and see how pagoda and seurat performs with velocyto. I found the same. The Pagoda2 + velocyto works best compared to Seurat2 + velocyto.
Many thanks for your suggestions.
I would rahter vote for Seurat2 + velocyto simply because Seurat enables "Diffusion Map" ( See RunDiffusion
function ) as a method of dimensionality reduction while Pagoda2 doesn't . I find diffusion maps better at capturing the dynamics of cell trajectories than t-SNE. Therefore, I would most of the time prefer to use a diffusion map embedding ( GetCellEmbeddings(object, reduction.type = "dm"
) ).
Could someone please explain to me how/where I get the emat (spliced count matrix) and nmat (unspliced count matrix) from if I am using a seurat object? I am trying to use velocyto.R to confirm some of my results.
@LineWulff - You will need to run the Velocyto command line tools on the dataset first to get the splicing information. That will create the loom file which you can get the emat and nmat. Then you can use the tSNE embeddings from Seurat, with the emat and nmat from the loom file to make your plot... it's a bit tricky.
@ramonvidal How did you make your pbmcY1.rds?
@ramonvidal How did you make your pbmcY1.rds?
Hi @x811zou , that is a typo, pbmcY1 == pbmc
@ramonvidal , I was able to create cell embedding and clustering using seurat, but when I rename the emat and nmat colnames by running
emat <- emat[,rownames(Mouse1stArch12@meta.data)]; nmat <- nmat[,rownames(Mouse1stArch12@meta.data)];
It gave me this :
Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) :
invalid character indexing
Do you have any suggestions? Thank you in advance!
I may not be following, but "renaming the emat and nmat colnames" should be
colnames(emat) <- rownames(Mouse1stArch12@meta.data)
colnames(nmat) <- rownames(Mouse1stArch12@meta.data)
@chlee-tabin , sorry I'm completely new to R scripts, I guess my question is in velocity estimation, they used Pagoda2 object for the following command:
emat <- emat[,rownames(r$counts)]; nmat <- nmat[,rownames(r$counts)]
How can I modify it with the Seurat object I created?
Thank you!
@chinalex9527 , with a Seurat object I did:
emat <- emat[,rownames(seuratobject@meta.data)]; nmat <- nmat[,rownames(seuratobject@meta.data)]
This step will filter out cells from emat and nmat, which are not in the seurat/pagoda object.
I had to rename my emat and nmat columns first though. You have to inspect how the cells are named in both your seurat object (e.g. rownames(seuratobject@meta.data)
) and the nmat/emat format.
Here is my full code for making this work -- hope this helps you all
#library(devtools)
#install_github("velocyto-team/velocyto.R")
library(velocyto.R)
library(pagoda2)
library(Seurat)
# This is generated from the Velocyto python command line tool.
# You need a loom file before you can proceed
ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom")
# Gather the spliced and unspliced estimates
emat <- ldat$spliced
nmat <- ldat$unspliced
# take embedding from the Seurat data object
# NOTE: This assumes you have a seurat data object loaded
# into R memory prior to using this script. STOP and rerun seurat
# pipeline if you do not have this loaded. In my case, my seurat object is simply myData
emb <- myData@dr$tsne@cell.embeddings
# Estimate the cell-cell distances
cell.dist <- as.dist(1-armaCor(t(myData@dr$tsne@cell.embeddings)))
# This is a little bit of foo-magic that needs to be adjusted on a per-sample
# basis depending on the cell names and how you ran the pipeline. Each cell
# stored in the loom object and seurat have an ID, make sure these are the same.
# in this case -- i need to trim 28 characters off the front to match seurat object.
colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="")
# What this step does is essentially this:
# > head(colnames(emat))
# [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx"
# [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx"
# [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx"
# > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
# > head(colnames(emat))
# [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG"
# Now the names in emat and nmat will match up to the cell names used in my seurat object
# I'm not sure what this parameter does to be honest. 0.02 default
# perform gamma fit on a top/bottom quantiles of expression magnitudes
fit.quantile <- 0.02
# Main velocity estimation
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
kCells=10,
cell.dist=cell.dist,
fit.quantile=fit.quantile,
n.cores=24)
# This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme.
gg <- TSNEPlot(myData)
ggplot_build(gg)$data
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)
p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
cell.colors=ac(colors,alpha=0.5),
cex=0.8,arrow.scale=2,show.grid.flow=T,
min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
do.par=F,cell.border.alpha = 0.1,
n.cores=24,main="Cell Velocity")#,cc=p1$cc)
@jebard , thank you so much! The code worked beautifully!
@jebard , Many thanks for your full code!! Very helpful, since applying RNA velocity does not seem to be straightforward, and a clear pipeline to use it in conjunction with Seurat is missing from the creators, as far as I know.
It will be great if you could please clarify two things:
1) Towards the end of your code when you try to get the color out of the Seurat TSNE object, the code you gave above doesn't seem to work anymore. I'm using Seurat v2.3.4
, R v3.5.0
and ggplot v2_3.1.0
. Maybe the slots have changed? I spent some time to explore the slots to find out where the color info is stored, but I haven't had luck so far. Can you please help if you know?
2) Once we successfully run show.velocity.on.embedding.cor
, how exactly should we make the plot? Is the output p1
a ggplot object? Can we use print()
pdf()
or cowplot::save_plot
with it?
Many thanks again, in advance!
Further, the output of show.velocity.on.embedding.cor
is a list. So, basically, I would like to know how best to convert this output to a ggplot
, gtable
, or a grob
.
Thank you!
@CodeInTheSkies
Went back to the code and was able to get the colors out using this:
gg <- TSNEPlot(myData) colors <- as.list(ggplot_build(gg)$data[[1]]$colour) names(colors) <- rownames(emb)
I'm not sure why originally i had it as ggplot$Lpo -- i'll update my code above accordingly.
The output of the command show.velocity.on.embedding.cor should result in a plot being generated. I use Rstudio so it automatically pops up a plot. How I typically export to a file is by:
png("~/my-exported-graph.png") show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt', cell.colors=ac(colors,alpha=0.5), cex=0.8,arrow.scale=2,show.grid.flow=T, min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1, do.par=F,cell.border.alpha = 0.1, n.cores=24,main="Cell Velocity") dev.off()
The reason I have been saving it into the p1 object, is that it can help speed up replotting the figure should you need to make any graphical changes (like arrow size / density) -- to do this, you pass the parameter cc=p1$cc
R version 3.4.2 Seurat_2.3.4 cowplot_0.9.2 ggplot2_3.0.0 velocyto.R_0.6
Hi @jebard
Thanks for very detailed demonstration!
Just one question related to this. Do I need filter.genes.by.cluster.expression
before gene.relative.velocity.estimates
?
I saw this step in this tutorial
It’s not strictly necessary to select “cluster-distinguishing genes - it’s a recommended step. If different clusters in your dataset represent different points in the dynamic process, this will likely result in an informative gene set.
On Apr 23, 2019, at 3:36 AM, MichaelPeibo notifications@github.com wrote:
Hi @jebard https://github.com/jebard Thanks for very detailed demonstration! Just one question related to this. Do I need filter.genes.by.cluster.expression before gene.relative.velocity.estimates? I saw this step in this tutorial http://pklab.med.harvard.edu/velocyto/notebooks/R/DG1.nb.html — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/velocyto-team/velocyto.R/issues/16#issuecomment-485679760, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2PX4XJZ5W6FDKBVR7ISGTPR237VANCNFSM4EQPMVFA.
Hi @pkharchenko Thanks for response! I will try to do filter and without to to if there is a difference.
Hi @jebard
Thanks for your code. I was not sure about this command: emb <- myData@dr$tsne@cell.embeddings
What exactly is dr?
When I was trying to run this command:
emb <- lrc8@dr$tsne@cell.embeddings Error: no slot of name "dr" for this object of class "Seurat"
Hi @MazzzzzeLuo
This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE
to get this slot.
Also make sure you are using Seuratv2 rather than v3.
This code is for exacting cell embeddings such as tsne or umap, you have to
RunTSNE
to get this slot.Also make sure you are using Seuratv2 rather than v3.
@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.
> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+ kCells=10,
+ cell.dist=cell.dist,
+ fit.quantile=fit.quantile,
+ n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) :
no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
longer object length is not a multiple of shorter object length
I'm having some trouble with transferring some of the commands over.
When I try to change the cluster label from the pagoda2 imput to a seurat one I keep throwing an error.
cluster.label <- object@ident
Error: no slot of name "ident" for this object of class "Seurat"
Shouldn't ident be automatically created when the cluster neighbors are generated? When I ran a DimPlot for UMAP I was able to group it by "ident" as well, so I'm not sure what the problem is.
Also like @GBeattie I am trying to use Seurat v3 rather than v2
This code is for exacting cell embeddings such as tsne or umap, you have to
RunTSNE
to get this slot. Also make sure you are using Seuratv2 rather than v3.@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.
> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2, + kCells=10, + cell.dist=cell.dist, + fit.quantile=fit.quantile, + n.cores=2) matching cells between cell.dist and emat/nmat ... done calculating cell knn ... done calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : no 'dimnames[[.]]': cannot use character indexing In addition: Warning message: In labels(cell.dist) == colnames(emat) : longer object length is not a multiple of shorter object length
hI, Have you solven this problem? I met the same error using Seurat 3. Thanks!
Here is my full code for making this work -- hope this helps you all
#library(devtools) #install_github("velocyto-team/velocyto.R") library(velocyto.R) library(pagoda2) library(Seurat) # This is generated from the Velocyto python command line tool. # You need a loom file before you can proceed ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom") # Gather the spliced and unspliced estimates emat <- ldat$spliced nmat <- ldat$unspliced # take embedding from the Seurat data object # NOTE: This assumes you have a seurat data object loaded # into R memory prior to using this script. STOP and rerun seurat # pipeline if you do not have this loaded. In my case, my seurat object is simply myData emb <- myData@dr$tsne@cell.embeddings # Estimate the cell-cell distances cell.dist <- as.dist(1-armaCor(t(myData@dr$tsne@cell.embeddings))) # This is a little bit of foo-magic that needs to be adjusted on a per-sample # basis depending on the cell names and how you ran the pipeline. Each cell # stored in the loom object and seurat have an ID, make sure these are the same. # in this case -- i need to trim 28 characters off the front to match seurat object. colnames(emat) <- paste(substring(colnames(emat),28,43),sep="") colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="") # What this step does is essentially this: # > head(colnames(emat)) # [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx" # [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx" # [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx" # > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="") # > head(colnames(emat)) # [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG" # Now the names in emat and nmat will match up to the cell names used in my seurat object # I'm not sure what this parameter does to be honest. 0.02 default # perform gamma fit on a top/bottom quantiles of expression magnitudes fit.quantile <- 0.02 # Main velocity estimation rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2, kCells=10, cell.dist=cell.dist, fit.quantile=fit.quantile, n.cores=24) # This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme. gg <- TSNEPlot(myData) ggplot_build(gg)$data colors <- as.list(ggplot_build(gg)$data[[1]]$colour) names(colors) <- rownames(emb) p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt', cell.colors=ac(colors,alpha=0.5), cex=0.8,arrow.scale=2,show.grid.flow=T, min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1, do.par=F,cell.border.alpha = 0.1, n.cores=24,main="Cell Velocity")#,cc=p1$cc)
I ran follow your code, and the last step went wrong, Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option how can I slove it?
Hi @Bioinformatics-rookie , I met the same problem before and solved it by installing libopenblas. I use anaconda so I ran conda install -c conda-forge libopenblas Hope it can help you.
This code is for exacting cell embeddings such as tsne or umap, you have to
RunTSNE
to get this slot. Also make sure you are using Seuratv2 rather than v3.@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.
> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2, + kCells=10, + cell.dist=cell.dist, + fit.quantile=fit.quantile, + n.cores=2) matching cells between cell.dist and emat/nmat ... done calculating cell knn ... done calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : no 'dimnames[[.]]': cannot use character indexing In addition: Warning message: In labels(cell.dist) == colnames(emat) : longer object length is not a multiple of shorter object length
I have the same problem too. Have you guys solved it?
Hello, I have encountered this problem, how can I solve it, I still do not understand sheild.seurat <- readRDS("/home/sheild.nodoublet7.16.rds") sheild$spliced<-sheild$spliced[,rownames(sheild.seurat@meta.data)] Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) : invalid character indexing @ramonvidal
When you run three more samples separately, one of them will have such a problem at this step, and the other two will work correctly. I found that the number of rows of the seurat object with the error sample did not match the number of row with spliced, while the other two samples matched
Hello,
I have tested the velocyto.R package with Pagoda2 as described in tutorials. But I would really like to test it with Seurat2. How do I create cell embedding, cell clustering, and accurate cell-cell distance matrix using Seurat2? Many thanks.