theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
133 stars 13 forks source link

Annotation of dataloader with two distinct disease obs keys fails to generate disease annotation files #703

Closed LisaSikkema closed 8 months ago

LisaSikkema commented 1 year ago

Hi,

I am writing a multi-file data loader yaml, and based on your documentation I understood that it is possible to specify distinct column names for the same variable (e.g. "organ" below), depending on the file, as exemplified in your tutorial:

dataset_structure:
    dataset_index: 1
    sample_fns:
        - "A"
        - "B"
dataset_wise:
    # ... part of yaml omitted ...
dataset_or_observation_wise:
    # ... part of yaml omitted
    healthy: True
    healthy_obs_key:
    individual:
    individual_obs_key:
    organ:
        A: "lung"
        B: "pancreas"
    organ_obs_key:
    # part of yaml omitted ...

I did the same for my own data loader, now for the disease variable, starting the yaml as follows:

dataset_structure:
    dataset_index: 1
    sample_fns:
        - "HLCA_v1.1_core.h5ad"
        - "HLCA_v1.1_full.h5ad"

and then specifying disease columns as:

    disease:
    disease_obs_key:
        HLCA_v1.1_full.h5ad: "disease"
        HLCA_v1.1_core.h5ad: "lung_condition"

However, after I run annotate-dataloader in sfaira, no annotation file is created for disease. Could it be this is some bug in your code?