Define template for recording bags of markers

obophenotype / cell-ontology

An ontology of cell types

https://obophenotype.github.io/cell-ontology/

Creative Commons Attribution 4.0 International

146 stars 49 forks source link

Define template for recording bags of markers #2136

Open dosumis opened 1 year ago

dosumis commented 1 year ago

Based on pattern used in

Tan, Shawn Zheng Kai, Huseyin Kir, Brian D. Aevermann, Tom Gillespie, Nomi Harris, Michael J. Hawrylycz, Nikolas L. Jorstad, et al. 2023. “Brain Data Standards - A Method for Building Data-Driven Cell-Type Ontologies.” Scientific Data 10 (1): 50. https://doi.org/10.1038/s41597-022-01886-2.

Needs slot for species context.

Also see PCL for example.

Can probably be done in ROBOT, but also easy to use DOSDP.

github-actions[bot] commented 8 months ago

This issue has not seen any activity in the past 6 months; it will be closed automatically in one year from now if no action is taken.

scheuerm commented 8 months ago

I would also like to have a template for this. But I would also like to make a distinction between a "bag of markers" and a set of markers that can serve as the necessary and sufficient axioms for defining a cell type class.

emquardokus commented 8 months ago

@scheuerm @dosumis Richard, I believe David's "bag of markers" is his jargon that means the same thing. I agree with you that I prefer this nomenclature to be explicitly stated as "a set of markers that can serve as the necessary and sufficient axioms for defining a cell type class."

dosumis commented 8 months ago

@scheuerm - agree we need to maker those we believe to be necessary and sufficient, but we also need to be careful about the use of OWL:EquivalentTo here - which is the strict logical way to assert necessary and sufficient conditions. In PCL, even with NS-Forest we ended up with incorrect automated classification of some rare cell types - IIRC this was because the NS-Forest algorithm was working on confidence of classification of single cells across a population. In the case of HubMap markers, any assertion that they are N&S is based on expert opinion rather than being derived from data. So - we may want a less formal way to assert this - one that allows use to separate out rigorous, algorithmically derived marker sets from those based on expert opinion.

dosumis commented 8 months ago

@scheuerm - general question. Is there some way we can use your pipelines to add classification confidence scores to marker sets coming from experts?

scheuerm commented 8 months ago

@dosumis yes, we just modified/modularized the NS-Forest code so that we can provide any set of markers along with a clustered cell x gene expression matrix and quantify classification performance. I've attached a comparison between NS-Forest marker performance on the performance using the markers reported in the Sikkema et al. paper for the HLCA lung atlas. The NS-Forest markers dramatically outperform the Sikkema markers for many of the clusters.

fscore_nsforest_hlca

dosumis commented 7 months ago

Existing BDSO template:

https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/src/patterns/dosdp-patterns/taxonomy_marker_set.yaml

Example input TSV:

https://github.com/obophenotype/brain_data_standards_ontologies/blob/master/src/patterns/data/default/CCN201912131_marker_set.tsv

This template already groups markers and records confidence and provenance. We need to add species and tissue context.

TODO:
- [ ] Pull across template to CL branch
- [ ] Make draft modifications for review - including adding context for species and tissue.

aleixpuigb commented 3 months ago

https://github.com/obophenotype/cell-ontology/issues/2397 is overlapping this ticket.