Inconsistency between annotation levels

I annotated my dataset using Azimuth reference (mouse motor cortex). I was very happy to see that each cell in my dataset now has 3 levels of cell type/cluster annotation! (Thank you for a great tool!)

However, there are inconsistencies between different annotation levels. For example, in my dataset, 39413 cells received the "predicted.subclass" label of "Astro". However, if I now summarize the "predicted.class" labels of these "astrocytes", the result is "GABAergic: 233"; "Glutamatergic: 1485"; "Non-neuronal: 37695". In other words, many cells receive top-level "predicted.class" label of a neuron (either GABAergic or Glutamatergic), but second-level "predicted.subclass" label of an astrocyte. The same problem happens with many cells on different levels. Hierarchical levels of "class"/"subclass"/"cluster" do not maintain nested hierarchy, they seem to be independent of each other.

I can kind of see how this can happen computationally, but this doesn't make biological sense. Is this the expected behavior of the algorithm? If yes, do you think maybe changes should be made to prevent this to better align with the biological reality?

Reference: https://zenodo.org/record/4546935

satijalab / azimuth

Inconsistency between annotation levels #176