jdekanter / CHETAH

scRNA-seq cell type identification
GNU Affero General Public License v3.0
42 stars 9 forks source link

Predictions contain Node1, Node4 labels #20

Closed Kevis9 closed 1 year ago

Kevis9 commented 1 year ago

Hi, Thanks for the great work! I have a question about the prediction results. Why sometimes prediction results contain some labels like Node1, Node2 etc. Is there a way to avoid this ? Here is my code:

    reference <- SingleCellExperiment(assays = list(counts = ref_data),
                                         colData = ref_ct)
    input <- SingleCellExperiment(assays = list(counts = query_data))
    input <- CHETAHclassifier(input = input, ref_cells = reference)
    ## Extract celltypes:
    pred <- input$celltype_CHETAH
jdekanter commented 1 year ago

Hi Kevis,

Thank you for using CHETAH. Please see the vignette: https://bioconductor.org/packages/release/bioc/vignettes/CHETAH/inst/doc/CHETAH_introduction.html For each cell, CHETAH walks the classification tree ( PlotTree(input) ) and at each node (split in the tree) decides whether a cell belongs to the right or left side based on similarity (correlation). When a cell is as similar/dissimilar to both branches (sides of the tree), the classification stops there. This means that CHETAH knows to one of the cell types below that node, but not which. For example, in the vignette. If a cell is assigned to node 6, CHETAH is confident that the cell is a T cell (all cells below node 6 are T cells), but not which specific type. This could for example be, because the cell of interest is a T cell subtype that is not in the reference (a gamma-delta T cell for example).

To stop this behaviour, just run the following: input <- Classify(input, 0) . Just be aware, this will likely increase the number of incorrect classifications. Feel free to reopen the issue, or open a new issue if you have more questions. https://bioconductor.org/packages/release/bioc/vignettes/CHETAH/inst/doc/CHETAH_introduction.html#confidence-score