Closed bwang1324 closed 2 years ago
Did you try looking at the documentation using ?make.cell.type
?
Thank you for your response. I did look at the documentation and formatted tpm and X as matrix, metadata as data.frame, and samples and cellQ as vectors. But I am still unable to run DIALOGUE.run.
From: Livnat Jerby @.> Reply-To: livnatje/DIALOGUE @.> Date: Saturday, May 14, 2022 at 12:14 AM To: livnatje/DIALOGUE @.> Cc: bwang1324 @.>, Author @.***> Subject: Re: [livnatje/DIALOGUE] Input data format (Issue #12)
Did you try looking at the documentation using ?make.cell.type?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Did you try running the test dataset?
If you could provide more information about the input you provide it will be easier for me to help you spot the problem.
I had the same error.
## subset the Seurat objects for each subtype
make_cell_type_from_seurat<- function(obj, cell_type, cell_type_name){
sub_obj<- obj[, obj$predicted.celltype.l1 %in% cell_type]
sub_obj<- sub_obj %>%
NormalizeData(normalization.method = "LogNormalize", scale.factor = 10000) %>%
FindVariableFeatures( selection.method = "vst", nfeatures = 2000) %>%
ScaleData() %>%
RunPCA(npcs = 30)
tpm<- as.matrix(sub_obj@assays$RNA@counts)
X<- sub_obj@reductions$pca@cell.embeddings
n_cells<- ncol(tpm)
samples<- sub_obj$patient
n_count<- scale(log(sub_obj@meta.data$nCount_RNA)) %>% as.vector()
metadata<- data.frame(patient = rep("RTD164", n_cells))
make.cell.type(name = cell_type_name ,tpm,samples,X,metadata, cellQ = n_count)
}
NK_cells<- make_cell_type_from_seurat(RTD164, cell_type = "NK", cell_type_name = "NK" )
T_cells<- make_cell_type_from_seurat(RTD164, cell_type = c("CD4 T", "CD8 T"), cell_type_name = "T" )
DCs<- make_cell_type_from_seurat(RTD164, cell_type = "DC", cell_type_name = "DC" )
rA<- list(NK_cells = NK_cells, T_cells = T_cells, DCs = DCs )
> res<- DIALOGUE.run(rA = rA, # list of cell.type objects
+ main = "My.data",
+ k = 5, # number of MCPs to identify
+ results.dir = "DIALOGUE.results/",
+ conf = "cellQ")# any potential confounders DIALOGUE needs to account for; default is "cellQ"
[1] "#************DIALOGUE Step I: PMD ************#"
[1] "NK: Removing 14 of 30 features."
[1] "T: Removing 8 of 30 features."
[1] "DC: Removing 24 of 30 features."
Error: $ operator is invalid for atomic vectors
Thanks for looking into it.
@crazyhottommy, would you mind sharing rA or a subset of it? Thanks
The object is too big to share. If you can make a tutorial using the seurat object from https://satijalab.org/seurat/articles/pbmc3k_tutorial.html step by step, that will be great.
I can also use this dummy seurat object to create rA
and see if it works.
Hi, here is a fully reproducible script
#devtools::install_github(repo = "https://github.com/livnatje/DIALOGUE")
#devtools::install_github('satijalab/seurat-data')
library(here)
library(DIALOGUE)
library(SeuratData)
# AvailableData()
InstallData("pbmc3k")
data("pbmc3k")
pbmc3k
table(pbmc3k$seurat_annotations)
make_cell_type_from_seurat<- function(obj, cell_type, cell_type_name){
sub_obj<- obj[, obj$seurat_annotations %in% cell_type]
sub_obj<- sub_obj %>%
NormalizeData(normalization.method = "LogNormalize", scale.factor = 10000) %>%
FindVariableFeatures( selection.method = "vst", nfeatures = 2000) %>%
ScaleData() %>%
RunPCA(npcs = 30)
tpm<- as.matrix(sub_obj@assays$RNA@counts)
X<- sub_obj@reductions$pca@cell.embeddings
n_cells<- ncol(tpm)
samples<- sub_obj$sample_id
n_count<- scale(log(sub_obj@meta.data$nCount_RNA)) %>% as.vector()
metadata<- data.frame(sample_type = rep("pbmc", n_cells))
make.cell.type(name = cell_type_name ,tpm,samples,X,metadata, cellQ = n_count)
}
If you only have one sample/orig.ident in the metadata, Dialogue
will give you error:
Error in rownames<-
(*tmp*
, value = ids.u) :
attempt to set 'rownames' on an object with no dimensions
Let's make some fake orig.ident pretending the pbmcs are from different samples.
set.seed(123)
sample_id<- sample(c("sample1", "sample2"), size = ncol(pbmc3k), replace =TRUE)
pbmc3k$sample_id<- sample_id
NK_cells<- make_cell_type_from_seurat(pbmc3k, cell_type = "NK", cell_type_name = "NK" )
T_cells<- make_cell_type_from_seurat(pbmc3k, cell_type = c("Memory CD4 T", "CD8 T"), cell_type_name = "T" )
rA<- list(NK_cells = NK_cells, T_cells = T_cells)
res<- DIALOGUE.run(rA = rA, # list of cell.type objects
main = "My.data",
k = 5, # number of MCPs to identify
results.dir = "DIALOGUE.results2/",
conf = "cellQ")# any potential confounders DIALOGUE needs to account for; default is "cellQ"
[1] "#************DIALOGUE Step I: PMD ************#"
[1] "NK: Removing 30 of 30 features."
[1] "T: Removing 30 of 30 features."
Error: $ operator is invalid for atomic vectors
saveRDS(rA, file = here::here("data/test_NK_T.rds"))
Thanks a lot for looking into it.
Thanks @crazyhottommy for the example. I think the example is again highlights the need for a clearer vignette from the authors for 10X data:
For tpm here you use the raw counts, some people are using log counts in other issues. @livnatje kindly explained that transcripts per million is the advised input. But to me, I am not sure this makes sense for UMI based data.
@livnatje, is it possible to draw a line under this and share an example script you have used to analyse 10x data? I see that you used the colitis dataset which is 10X genomics 3 prime data.
Can you provide the reproducibility script as an example? I noted that the code used in the manuscript was not part of the code availability statement, but in this case, it would be incredibly helpful.
Many thanks
@crazyhottommy thanks for providing the code. The code is perfectly fine.
The reason you are getting an error is because the data is too small (includes too few samples), and therefore there are no features that pass the original filtering step. This is also noted in the text messages:
"NK: Removing 30 of 30 features."
"T: Removing 30 of 30 features."
So there is no data left to work with. This is an ANOVA based filtering, to ensure that the features show substantial inter-sample variation. In this case it's because there aren't enough samples.
I added another error message to reflect that. You need at least 10 samples (or user-defined spatial niches in spatial data) to identify MCPs.
Hope this helps @crazyhottommy and @Nusob888.
A more elaborated tutorial is on the way
Thanks for looking into it. yes, I realized that. But in my real-world example at https://github.com/livnatje/DIALOGUE/issues/12#issuecomment-1203175209
[1] "#****DIALOGUE Step I: PMD ****#" [1] "NK: Removing 14 of 30 features." [1] "T: Removing 8 of 30 features." [1] "DC: Removing 24 of 30 features." Error: $ operator is invalid for atomic vectors
It did not remove all of them, but I got the same error.
So one needs at least 10 samples to use DIALOGUE?
Let me see if I can spot the source of the error you get.
As for sample size, 5 samples is the lower bound for sure. Recommended to work with 10 samples or more. You can try with less than 10. Anyhow DIALOGUE will avoid overfitting, using empirical p-values etc.
Any update on the errors? Or if you can provide a working example from Seurat objects, that will be helpful too. Thanks!
Hi @Nusob888 and @crazyhottommy,
A Seurat wrapper and a toy example tutorial showing how to un DIALOGUE with a Seurat object are now provided
I hope this helps.
Hello, thank you for your work, I am trying to apply this to our lab's 10X data. I am able to get make.cell.type() to generate the objects but when running DIALOGUE.run(), I get the following error after Step I: PMD "Error: $ operator is invalid for atomic vectors"
I am wondering if this is due to formatting of 10X data in make.cell.type(). Could you please specify formatting (matrix, vector etc) and provide examples of each input for make.cell.type()? For example, make.cell.type() requires to specify cellQ even if it included as column in metadata.