theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
134 stars 11 forks source link

metadata in .uns or .obs #16

Closed davidsebfischer closed 2 years ago

davidsebfischer commented 3 years ago

per dataset: .uns and accessible in lazy mode

assay (replaces protocol)
contributors (replaces author, only takes name field from cellxgene)
has_celltypes
id
normalization
lab
organism (replaces animal)
preprint_doi (new)
publication_doi (new)
version (new)
wget_download
year
(delete) publication
(delete) protocol

per cell: .obs and accessible in lazy mode as list of unique entries (these are .obs attributes that can be represented as a categorical with a short list of entries)

dev_stage
disease (new)
healthy (might be reduced to a function of disease)
organ
sex
subtissue

per cell: .obs and not accessible in lazy mode

age
cell_types_original
cell_ontology_class
cell_ontology_id
ethnicity
state_exact

@ambrosejcarr happy to get feedback on this

davidsebfischer commented 3 years ago

The corresponding schema in cellxgene is documented here https://github.com/chanzuckerberg/cellxgene/blob/345306ef7b2acc23731583a0f318a37269089efd/dev_docs/schema_guide.md

uns:
    version:
        corpora_schema_version: 1.0.0
        corpora_encoding_version: 0.1.0
    contributors:
    title:
    layer_descriptions:
    preprint_doi:
    publication_doi:
    organism_ontology_term_id:
obs:
    tissue_ontology_term_id:
    assay_ontology_term_id:
    disease_ontology_term_id:
    cell_type_ontology_term_id:
    sex:
    ethnicity_ontology_term_id:
    development_stage_ontology_term_id:
davidsebfischer commented 3 years ago

We will try to use ontologies where possible for meta data. The suggested ontologies are:

general:

human:

mouse: