scFAIR / celltag_schema

A set of schemas to support FAIR annotation of cell types
Creative Commons Zero v1.0 Universal
0 stars 1 forks source link

Decide on schema modularity #2

Open dosumis opened 1 year ago

dosumis commented 1 year ago

STATUS: ROUGH DRAFT

Prior work: https://github.com/kharchenkolab/CxG_metaschema - defines a meta-schema for CxG standard AnnData files. This is a critical use case to support, but we want this schema to work more generally, e.g. supporting planned independent cell_type annotation files for CAP.

Also see https://github.com/brain-bican/CCN2

Meta-schema aims:

Draft schema

  "definitions": {
    "field_relationship": {
      "type": "object",
      "additionalItems": false,
      "required": [
        "relation",
        "object"
      ],
      "properties": {
        "relation": {
          "type": "string",
          "enum": [
            "broader_than"
          ]
        },
        "object": {
          "type": "string",
          "description": "The name of a field used in annotation. Object of the relationship."
        }
      }
    }
  },
  "required": [
    "field_name",
    "field_type"
  ],
  "additionalProperties": false,
  "properties": {
    "field_name": {
      "type": "string",
      "description": "The name (key) of an obs field associated with a cell."
    },
    "field_type": {
      "type": "string",
      "enum": [
        "cell_type_ontology_label",
        "cell_type_ontology_id",
        "free_text_cell_type_name"
      ]
    },
    "field_scope": {
      "type": "scope",
      "enum": [
        "exact",
        "broad"
      ]
    },
    "field_relationship": {
      "type": "array",
      "$ref": "#/definitions/field_relationship"
    },

Annotation schema aims - record evidence and provenance for cell type annotation:

Draft schema

    "manual_annotation_metadata": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "author": {
          "type": "string",
          "description": "ORCID of annotator"
        },
        "supporting_publication": {
          "type": "string",
          "description": "DOI of supporting publication."
        },
        "evidence_comment": {
          "type": "string",
          "description": "Free text description of supporting evidence for annotation."
        },
        "marker_evidence": {
          "type": "array",
          "internal_notes": "Should this support name/ID pairs?; Should valid IDs be gene IDs in matrix?",
          "description": "A list of expressed genes that support this cell type annotation.",
          "items": {
            "type": "string"
          }
        }
      }
    },
    "automated_cell_type_annotation_metadata": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "algorithm": {
          "type": "string",
          "description": "Name of algorithm"
        },
        "algorithm_reference": {
          "type": "string",
          "description": "DOI: Protocols.io?"
        },
        "reference_data": {
          "type": "string",
          "description": "PURL?"
        }
      }
    }

Critique:
No link to any standard semantics/schema for representing provenance/evidence. Adding this would allow for easy conversion to KG.

dosumis commented 1 year ago

@ubyndr - could we use this very basic metaschema for pandasaurus_cxg - harvesting type from Bradley's spreadsheet curation.

{
"required": [
    "field_name",
    "field_type"
  ],
  "additionalProperties": true,
  "properties": {
    "field_name": {
      "type": "string",
      "description": "The name (key) of an obs field associated with a cell."
    },
    "field_type": {
      "type": "string",
      "enum": [
        "cell_type_ontology_label",
        "cell_type_ontology_id",
        "free_text_cell_type_name"
      ]
    },
    "field_scope": {
      "type": "scope",
      "enum": [
        "exact",
        "broad"
      ]
    }
}

We will fold the annotation schema in. Evan's is starting to get closer to the above.