neurorestore / Libra

MIT License
154 stars 25 forks source link

edgeR analysis not preserving the level and order information of the Cell_type #21

Open RobertWhitener opened 2 years ago

RobertWhitener commented 2 years ago

Hi,

I'm interested in looking at DE differences between, for example, edgeR and MAST. Running edgeR on a seurat object works well, but the output loses cell_type level information, while the MAST run does not.

SEURAT$cell_type is a factor, whereas after running edgeR levels(edgeR_output$cell_type) returns NULL . After MAST, levels(MAST_output$cell_type) reports the factor levels correctly.

Is that an issue for the underlying algorithms ?

It would be really nice if the edgeR calculation could maintain both the levels AND order of the input matrix, so that it isn't necessary to do that manually for all of the different subclustering analysis that we need to do.

Thanks! Robert

jordansquair commented 2 years ago

I don't think this is a bug? You can just reset the factor levels in your DE object after running it?

mhvbyrne commented 1 year ago

I have the same issue, I don't really understand how I'm supposed to reset the factor levels in the DE object after I've run it.

` # Perform differential expression
> DE <- run_de(input = counts,
+              meta = metadata,
+              replicate_col = "cart_patient_id", #as we have multiple samples from the same patient
+              cell_type_col = "compartment", #the cell type
+              label_col = "tissue_loc", # what you want to compare
+              min_features = 5, #minimum number of counts for inclusion of a feature
+              de_family = "pseudobulk", # need to do this as we are using single cell data
+              de_method = "DESeq2", # can change this to whatever is needed
+              de_type = "Wald" # stats method
+              )
[1] "B_Plasma"
[1] "Endothelia"
[1] "Mono"
[1] "NK"
[1] "PT epithelia"
[1] "RCC_PT_epithelia"
[1] "T"
[1] "fibroblasts"
[1] "nonPT epithelia"
converting counts to integer mode
factor levels were dropped which had no samples
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing`

DE object, cell type is just nonPT epithelia - I can't see any other option to view different cell types

tibble [24,669 × 8] (S3: tbl_df/tbl/data.frame)
 $ cell_type: chr [1:24669] "nonPT epithelia" "nonPT epithelia" "nonPT epithelia" "nonPT epithelia" ...
 $ gene     : chr [1:24669] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
 $ avg_logFC: num [1:24669] 0.0539 -0.4222 -0.0472 0.1384 -0.0429 ...
 $ p_val    : num [1:24669] 0.904 0.423 0.949 0.691 0.933 ...
 $ p_val_adj: num [1:24669] 1 1 1 1 1 ...
 $ de_family: chr [1:24669] "pseudobulk" "pseudobulk" "pseudobulk" "pseudobulk" ...
 $ de_method: chr [1:24669] "DESeq2" "DESeq2" "DESeq2" "DESeq2" ...
 $ de_type  : chr [1:24669] "Wald" "Wald" "Wald" "Wald" ...

When I use to_pseudobulk I get matrices with all the cell types present

jordansquair commented 1 year ago

Set the factor levels before.

mhvbyrne commented 1 year ago

Hi Thank you for your quick reply!

I have tried:

metadata$compartment <- as.factor(metadata$compartment)

However I still have the same problem, do I need to set them a different way? I am new to DE analysis so I'm sorry if this is something very simple.

jordansquair commented 1 year ago

You need to set the order - levels = c(). This is not a libra bug so better to post such issues on the relevant stack exchange thread.