openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
307 stars 77 forks source link

Fields for new dataset inclusion for future compatability w/ Sfaira/cellxgene #217

Closed LuckyMD closed 1 week ago

LuckyMD commented 3 years ago

Here is a list of fields that new data loaders added during the jamboree should provide to make future integration with cellxgene easier:

    author: Union[str, list] = ''  # author (list) who sampled / created the data set
    doi: str = ''  # doi of data set accompanying manuscript

    sample_fns: Union[str, Dict[str, list]] = ''  # file name of the first *.h5ad file
    download_url_data: str = ''  # download website(s) of data files
    download_url_meta: str = '' (optional) # download website(s) of meta data files
    organ: str = ''  # (*) organ (anatomical structure)
    organism: str = ''  # (*) species / organism
    assay: str = ''  # (*, optional) protocol used to sample data (e.g. smart-seq2)
    normalization: str = ''  # raw or the used normalization technique
    year: str = 2021  # year in which sample was acquired
    number_of_datasets: str = 1  # Required to determine the file names
    cell_type: str = "" # Will be auto-converted to Cell ontology labels, required for future loading into cellxgene.
scottgigante commented 3 years ago

Can we add these to the dataset decorator function so that they're enforced?

github-actions[bot] commented 1 week ago

This issue has been automatically closed because it has not had recent activity.