Open dosumis opened 1 year ago
This issue has not seen any activity in the past 6 months; it will be closed automatically in one year from now if no action is taken.
I would also like to have a template for this. But I would also like to make a distinction between a "bag of markers" and a set of markers that can serve as the necessary and sufficient axioms for defining a cell type class.
@scheuerm @dosumis Richard, I believe David's "bag of markers" is his jargon that means the same thing. I agree with you that I prefer this nomenclature to be explicitly stated as "a set of markers that can serve as the necessary and sufficient axioms for defining a cell type class."
@scheuerm - agree we need to maker those we believe to be necessary and sufficient, but we also need to be careful about the use of OWL:EquivalentTo here - which is the strict logical way to assert necessary and sufficient conditions. In PCL, even with NS-Forest we ended up with incorrect automated classification of some rare cell types - IIRC this was because the NS-Forest algorithm was working on confidence of classification of single cells across a population. In the case of HubMap markers, any assertion that they are N&S is based on expert opinion rather than being derived from data. So - we may want a less formal way to assert this - one that allows use to separate out rigorous, algorithmically derived marker sets from those based on expert opinion.
@scheuerm - general question. Is there some way we can use your pipelines to add classification confidence scores to marker sets coming from experts?
@dosumis yes, we just modified/modularized the NS-Forest code so that we can provide any set of markers along with a clustered cell x gene expression matrix and quantify classification performance. I've attached a comparison between NS-Forest marker performance on the performance using the markers reported in the Sikkema et al. paper for the HLCA lung atlas. The NS-Forest markers dramatically outperform the Sikkema markers for many of the clusters.
Existing BDSO template:
Example input TSV:
This template already groups markers and records confidence and provenance. We need to add species and tissue context.
https://github.com/obophenotype/cell-ontology/issues/2397 is overlapping this ticket.
Based on pattern used in
Tan, Shawn Zheng Kai, Huseyin Kir, Brian D. Aevermann, Tom Gillespie, Nomi Harris, Michael J. Hawrylycz, Nikolas L. Jorstad, et al. 2023. “Brain Data Standards - A Method for Building Data-Driven Cell-Type Ontologies.” Scientific Data 10 (1): 50. https://doi.org/10.1038/s41597-022-01886-2.
Needs slot for species context.
Also see PCL for example.
Can probably be done in ROBOT, but also easy to use DOSDP.