saeyslab / muscatWrapper

Easy muscat differential expression analysis and visualization
GNU General Public License v3.0
1 stars 0 forks source link

muscat_analysis don't accept DICE annotation #2

Open obotman opened 1 year ago

obotman commented 1 year ago

Hello, I run muscat_analysis with sce containing cell annotations, 9 samples in duplicate and treatment. Cell annotation give me this table:

    ABE157  ABE158  ABE159  ABE160  ABE161  ABE162  ABE163  ABE164  ABE165  ABE166  ABE167  ABE168  ABE169  ABE170  ABE171  ABE172  ABE173  ABE174
    NK_cells    42  52  42  16  45  77  0   2   1   1   5   4   4   0   5   9   1   1
Keep    T_cells_CD4+_memory_TREG    2041    2378    1139    643 4177    6808    3329    3437    2327    2256    3816    3537    2117    2902    2972    2024    2418    2198
Keep    T_cells_CD4+_naive_stimulated   106 112 52  36  91  189 635 1009    189 166 1361    1301    583 637 710 621 1840    1919
    T_cells_CD4+_TFH    8   7   9   0   6   5   1   1   4   10  1   0   2   9   7   4   9   8
Keep    T_cells_CD4+_Th1    1415    1850    984 581 725 1297    1143    1067    4305    3950    2298    2416    964 1275    2782    2091    936 807
    T_cells_CD4+_Th1_17 26  27  18  17  3   13  15  15  81  89  33  31  11  16  59  60  23  20
    T_cells_CD4+_Th17   210 212 164 122 14  13  30  35  72  105 20  28  1   5   16  22  7   7
    T_cells_CD4+_Th2    1040    1038    691 372 46  61  23  24  78  84  28  27  15  12  42  49  7   11
Keep    T_cells_CD8+_naive_stimulated   89  99  29  21  122 240 219 226 141 131 325 360 203 218 187 137 336 279

When I run muscat_analysis:

muscatSubDICE <- muscat_analysis(
     sce = sce2,
     celltype_id = celltype_id,
     sample_id = sample_id,
     group_id = group_id,
     covariates = NA,
     contrasts_oi = contrasts_oi,
     contrast_tbl = contrast_tbl,
     assay_oi_pb = "counts",
     fun_oi_pb = "sum",
     de_method_oi = "edgeR",
     min_cells = 10,
     verbose=TRUE)

Analysis reject the cell type "T_cells_CD4+_memory_TREG" while I have at minimum 643 cells

[1] "Calculate expression information"
[1] "Calculate differential expression for all cell types"
[1] "excluded cell types are:"
[1] "NK_cells"                      "T_cells_CD4+_memory_TREG"
[3] "T_cells_CD4+_TFH"              "T_cells_CD4+_Th1_17"
[5] "T_cells_CD8+_naive_stimulated"
[1] "These celltypes are not considered in the analysis. After removing samples that contain less cells than the required minimal, some groups don't have 2 or more samples anymore. As a result the analysis cannot be run. To solve this: decrease the number of min_cells or change your group_id and pool all samples that belong to groups that are not of interest! "

Could you tell me why even if I have a lot of cells, muscat reject this cell type?

Best regards, Olivier

browaeysrobin commented 1 year ago

Hi @obotman

We have never encountered this issue before. Can you let us know whether you still have the same issue?

If yes: be sure that all the input for the function is correct and appropriate as written in the vignettes of this package and the function documentation.

Some things to thoroughly check: 1) is the annotation-sample table with cell numbers you show from the sce2 object that you give as input for the muscat_analysis function? If yes, can you share the code by which you generate this table? 2) does sample_id in muscat_analysis correspond (exactly) to the column names of your annotation table? 3) does celltype_id in muscat_analysis correspond (exactly) to the row names of your annotation table? 4) do all sample_ids belong to a group, as indicated with group_id and given in contrast_tbl?

obotman commented 1 year ago

Hello, A collaborator from the VIB university, Leuven Belgium told me that muscatWrapper apply some filters: filterByExpr.min.count = 7, filterByExpr.min.total.count = 15, filterByExpr.large.n = 4, filterByExpr.min.prop = 0.7 Could you confirm this and explain me in more details these filters?

In my colData, I have:

sce1@colData$
sce1@colData$orig.ident         sce1@colData$SampleID           sce1@colData$treatment
sce1@colData$nCount_RNA         sce1@colData$Shortname          sce1@colData$ident
sce1@colData$nFeature_RNA       sce1@colData$seurat_clusters    sce1@colData$sizeFactor
sce1@colData$nCount_ADT         sce1@colData$nCount_SCT         sce1@colData$label
sce1@colData$nFeature_ADT       sce1@colData$nFeature_SCT       sce1@colData$annotation
sce1@colData$percent.mt         sce1@colData$cellLine           sce1@colData$annotationSubDICE

sample_id="SampleID" group_id="treatment" celltype_id="annotationSubDICE"

The table is simply generate with this code below for the SampleID (=sample_id) and treatment (=group_id) and just compiled in excel:

table(SummarizedExperiment::colData(sce2)$annotationSubDICE, SummarizedExperiment::colData(sce2)$SampleID)
table(SummarizedExperiment::colData(sce2)$annotationSubDICE, SummarizedExperiment::colData(sce2)$treatment)
treatment   Im1Im1      Im1no       Im1wt       Im2Im2      Im2no       Im2wt       wtIm1       wtno        wtwt    
SampleID    ABE157  ABE158  ABE159  ABE160  ABE161  ABE162  ABE163  ABE164  ABE165  ABE166  ABE167  ABE168  ABE169  ABE170  ABE171  ABE172  ABE173  ABE174
NK_cells    42  52  42  16  45  77  0   2   1   1   5   4   4   0   5   9   1   1
T_cells_CD4+_memory_TREG    2041    2378    1139    643 4177    6808    3329    3437    2327    2256    3816    3537    2117    2902    2972    2024    2418    2198
T_cells_CD4+_naive_stimulated   106 112 52  36  91  189 635 1009    189 166 1361    1301    583 637 710 621 1840    1919
T_cells_CD4+_TFH    8   7   9   0   6   5   1   1   4   10  1   0   2   9   7   4   9   8
T_cells_CD4+_Th1    1415    1850    984 581 725 1297    1143    1067    4305    3950    2298    2416    964 1275    2782    2091    936 807
T_cells_CD4+_Th1_17 26  27  18  17  3   13  15  15  81  89  33  31  11  16  59  60  23  20
T_cells_CD4+_Th17   210 212 164 122 14  13  30  35  72  105 20  28  1   5   16  22  7   7
T_cells_CD4+_Th2    1040    1038    691 372 46  61  23  24  78  84  28  27  15  12  42  49  7   11
T_cells_CD8+_naive_stimulated   89  99  29  21  122 240 219 226 141 131 325 360 203 218 187 137 336 279

As you can see for example, the T_cells_CD4+_memory_TREG have enough cells per group and sample and so I'm a little bit surprise that this cell type is excluded.

browaeysrobin commented 1 year ago

Hi @obotman

For an explanation about these filters, check ?edgeR::filterByExpr - but this is just a gene filter and will not influence which cells/cell types are considered for the analysis

As you say: T_cells_CD4+_memory_TREG should be retained. Moreover, I also see that T_cells_CD8+_naive_stimulated should be retained, but it gets also removed. On the other hand, T_cells_CD4+_Th2 and T_cells_CD4+_Th17 should be removed but are retained here. I'm struggling to find the reason for this, though. So some follow-up questions:

1) Can you try to run the DE analysis with the get_DE_info function from the multinichenetr package? (https://github.com/saeyslab/multinichenetr/blob/main/vignettes/basic_analysis_steps_MISC.md) Do you get the same error? 2) If not, I will try to find out what the bug is here in the muscatWrapper package, which is based on the multinichenetr package and should work in the same way.

If yes, and you have the same error, I suspect something went wrong during the creation of (the colData of) your sce object, namely with the cell-celltypeID annotations, or with how multinichener handles the colData of your sce object. I think this because 2 cell types are retained that shouldn't and 2 cell types that should be retained are kicked out. I suspect they got intermixed for some reason. Some suggestions to troubleshoot this and further locate the problem: try once the normal muscat package. Output of that package can still be visualized with functions from this muscatWrapper package. If the problem lies with the muscatWrapper/multinichenetr package, this should work then here. If the problem lies with how you created the sce object, then this won't work though. In that case, you will have to find out where things went wrong.

obotman commented 1 year ago

Sorry for the delay and thanks to follow this issue. I try get_DE_info and I get the same response.

But before your response to this issue, I asked to Helena Crowell, who developed the muscat (normal muscat package?), and she answered that was not only a matter of sufficient number of cells but cells expressing a minimum of genes. In the muscatWrapper the settings for pb::edegeR are: filterByExpr.min.count = 7, filterByExpr.min.total.count = 15, filterByExpr.large.n = 4, filterByExpr.min.prop = 0.7

If I run: pbDS(pb, design = mm, contrast = contrast, method="edgeR",filter="none") , all cell types are retained. So it's linked to the number of cells and genes expressed per cells and how many certain genes are expressed.

Unfortunately there is no way to control this filter with muscat_analysis()

So I'm gonna try the function from the muscatWrapper to visualized output results from the muscat pipeline and/or filtering cells with low number of genes before running muscatWrapper and see what happen.

Best regards, Olivier