livnatje / DIALOGUE

DIALOGUE is a dimensionality reduction method that uses cross-cell-type associations to identify multicellular programs (MCPs) and map the cell transcriptome as a function of its environment.
Other
106 stars 16 forks source link

Input data format #12

Closed bwang1324 closed 2 years ago

bwang1324 commented 2 years ago

Hello, thank you for your work, I am trying to apply this to our lab's 10X data. I am able to get make.cell.type() to generate the objects but when running DIALOGUE.run(), I get the following error after Step I: PMD "Error: $ operator is invalid for atomic vectors"

I am wondering if this is due to formatting of 10X data in make.cell.type(). Could you please specify formatting (matrix, vector etc) and provide examples of each input for make.cell.type()? For example, make.cell.type() requires to specify cellQ even if it included as column in metadata.

livnatje commented 2 years ago

Did you try looking at the documentation using ?make.cell.type?

bwang1324 commented 2 years ago

Thank you for your response. I did look at the documentation and formatted tpm and X as matrix, metadata as data.frame, and samples and cellQ as vectors. But I am still unable to run DIALOGUE.run.

 

From: Livnat Jerby @.> Reply-To: livnatje/DIALOGUE @.> Date: Saturday, May 14, 2022 at 12:14 AM To: livnatje/DIALOGUE @.> Cc: bwang1324 @.>, Author @.***> Subject: Re: [livnatje/DIALOGUE] Input data format (Issue #12)

Did you try looking at the documentation using ?make.cell.type?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

livnatje commented 2 years ago

Did you try running the test dataset?

If you could provide more information about the input you provide it will be easier for me to help you spot the problem.

crazyhottommy commented 2 years ago

I had the same error.

## subset the Seurat objects for each subtype
make_cell_type_from_seurat<- function(obj, cell_type, cell_type_name){
  sub_obj<- obj[, obj$predicted.celltype.l1 %in% cell_type]
  sub_obj<- sub_obj %>%
    NormalizeData(normalization.method = "LogNormalize", scale.factor = 10000) %>%
    FindVariableFeatures( selection.method = "vst", nfeatures = 2000) %>%
    ScaleData() %>%
    RunPCA(npcs = 30) 

  tpm<- as.matrix(sub_obj@assays$RNA@counts)

  X<- sub_obj@reductions$pca@cell.embeddings
  n_cells<- ncol(tpm)
  samples<- sub_obj$patient
  n_count<- scale(log(sub_obj@meta.data$nCount_RNA)) %>% as.vector()
  metadata<- data.frame(patient = rep("RTD164", n_cells))

  make.cell.type(name = cell_type_name ,tpm,samples,X,metadata, cellQ = n_count)
}

NK_cells<- make_cell_type_from_seurat(RTD164, cell_type = "NK", cell_type_name = "NK" )
T_cells<- make_cell_type_from_seurat(RTD164, cell_type = c("CD4 T", "CD8 T"), cell_type_name = "T" )
DCs<- make_cell_type_from_seurat(RTD164, cell_type = "DC", cell_type_name = "DC" )

rA<- list(NK_cells = NK_cells, T_cells = T_cells, DCs = DCs )

> res<- DIALOGUE.run(rA = rA, # list of cell.type objects
+                 main = "My.data",
+                 k = 5, # number of MCPs to identify
+                 results.dir = "DIALOGUE.results/", 
+                 conf = "cellQ")# any potential confounders DIALOGUE needs to account for; default is "cellQ" 
[1] "#************DIALOGUE Step I: PMD ************#"
[1] "NK: Removing 14 of 30 features."
[1] "T: Removing 8 of 30 features."
[1] "DC: Removing 24 of 30 features."
Error: $ operator is invalid for atomic vectors

Thanks for looking into it.

livnatje commented 2 years ago

@crazyhottommy, would you mind sharing rA or a subset of it? Thanks

crazyhottommy commented 2 years ago

The object is too big to share. If you can make a tutorial using the seurat object from https://satijalab.org/seurat/articles/pbmc3k_tutorial.html step by step, that will be great.

I can also use this dummy seurat object to create rA and see if it works.

crazyhottommy commented 2 years ago

Hi, here is a fully reproducible script


#devtools::install_github(repo = "https://github.com/livnatje/DIALOGUE") 
#devtools::install_github('satijalab/seurat-data')
library(here)
library(DIALOGUE)
library(SeuratData)

# AvailableData()
InstallData("pbmc3k")
data("pbmc3k")
pbmc3k

table(pbmc3k$seurat_annotations)

make_cell_type_from_seurat<- function(obj, cell_type, cell_type_name){
  sub_obj<- obj[, obj$seurat_annotations %in% cell_type]
  sub_obj<- sub_obj %>%
    NormalizeData(normalization.method = "LogNormalize", scale.factor = 10000) %>%
    FindVariableFeatures( selection.method = "vst", nfeatures = 2000) %>%
    ScaleData() %>%
    RunPCA(npcs = 30) 

  tpm<- as.matrix(sub_obj@assays$RNA@counts)

  X<- sub_obj@reductions$pca@cell.embeddings
  n_cells<- ncol(tpm)
  samples<- sub_obj$sample_id
  n_count<- scale(log(sub_obj@meta.data$nCount_RNA)) %>% as.vector()
  metadata<- data.frame(sample_type = rep("pbmc", n_cells))

  make.cell.type(name = cell_type_name ,tpm,samples,X,metadata, cellQ = n_count)
}

If you only have one sample/orig.ident in the metadata, Dialogue will give you error: Error in rownames<-(*tmp*, value = ids.u) : attempt to set 'rownames' on an object with no dimensions

Let's make some fake orig.ident pretending the pbmcs are from different samples.


set.seed(123)

sample_id<- sample(c("sample1", "sample2"), size = ncol(pbmc3k), replace =TRUE)
pbmc3k$sample_id<- sample_id

NK_cells<- make_cell_type_from_seurat(pbmc3k, cell_type = "NK", cell_type_name = "NK" )
T_cells<- make_cell_type_from_seurat(pbmc3k, cell_type = c("Memory CD4 T", "CD8 T"), cell_type_name = "T" )

rA<- list(NK_cells = NK_cells, T_cells = T_cells)

res<- DIALOGUE.run(rA = rA, # list of cell.type objects
                main = "My.data",
                k = 5, # number of MCPs to identify
                results.dir = "DIALOGUE.results2/", 
                conf = "cellQ")# any potential confounders DIALOGUE needs to account for; default is "cellQ" 

[1] "#************DIALOGUE Step I: PMD ************#"
[1] "NK: Removing 30 of 30 features."
[1] "T: Removing 30 of 30 features."

Error: $ operator is invalid for atomic vectors

saveRDS(rA, file = here::here("data/test_NK_T.rds"))

Thanks a lot for looking into it.

Nusob888 commented 2 years ago

Thanks @crazyhottommy for the example. I think the example is again highlights the need for a clearer vignette from the authors for 10X data:

For tpm here you use the raw counts, some people are using log counts in other issues. @livnatje kindly explained that transcripts per million is the advised input. But to me, I am not sure this makes sense for UMI based data.

@livnatje, is it possible to draw a line under this and share an example script you have used to analyse 10x data? I see that you used the colitis dataset which is 10X genomics 3 prime data.

Can you provide the reproducibility script as an example? I noted that the code used in the manuscript was not part of the code availability statement, but in this case, it would be incredibly helpful.

Many thanks

livnatje commented 2 years ago

@crazyhottommy thanks for providing the code. The code is perfectly fine.

The reason you are getting an error is because the data is too small (includes too few samples), and therefore there are no features that pass the original filtering step. This is also noted in the text messages:

"NK: Removing 30 of 30 features."
"T: Removing 30 of 30 features."

So there is no data left to work with. This is an ANOVA based filtering, to ensure that the features show substantial inter-sample variation. In this case it's because there aren't enough samples.

I added another error message to reflect that. You need at least 10 samples (or user-defined spatial niches in spatial data) to identify MCPs.

Hope this helps @crazyhottommy and @Nusob888.

A more elaborated tutorial is on the way

crazyhottommy commented 2 years ago

Thanks for looking into it. yes, I realized that. But in my real-world example at https://github.com/livnatje/DIALOGUE/issues/12#issuecomment-1203175209

[1] "#****DIALOGUE Step I: PMD ****#" [1] "NK: Removing 14 of 30 features." [1] "T: Removing 8 of 30 features." [1] "DC: Removing 24 of 30 features." Error: $ operator is invalid for atomic vectors

It did not remove all of them, but I got the same error.

So one needs at least 10 samples to use DIALOGUE?

livnatje commented 2 years ago

Let me see if I can spot the source of the error you get.

As for sample size, 5 samples is the lower bound for sure. Recommended to work with 10 samples or more. You can try with less than 10. Anyhow DIALOGUE will avoid overfitting, using empirical p-values etc.

crazyhottommy commented 2 years ago

Any update on the errors? Or if you can provide a working example from Seurat objects, that will be helpful too. Thanks!

livnatje commented 2 years ago

Hi @Nusob888 and @crazyhottommy,

A Seurat wrapper and a toy example tutorial showing how to un DIALOGUE with a Seurat object are now provided

I hope this helps.