theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
133 stars 13 forks source link

Universe.subset() throws weird error when key="doi" #695

Closed le-ander closed 1 year ago

le-ander commented 1 year ago

on feature/update_old_dataloaders, subsetting a sfaira universe object using the key "doi" raises an uninformative error. I guess the key "doi" is no longer valid and either "doi_journal" or "doi_preprint" should be used. we should try to raise the usual "key not part of sfaira-controlled metadata" error instead of the current behaviour.

univ.subset(key="doi", values=["10.1038/s41586-019-1654-9"])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-a8133621ca3c> in <module>
----> 1 univ.subset(key="doi", values=["10.1038/s41586-019-1654-9"])

/lustre/groups/ml01/code/leander.dony/sfaira/sfaira/data/dataloaders/base/dataset_group.py in subset(self, **kwargs)
    945 
    946     def subset(self, **kwargs):
--> 947         self.dataset_groups = [x for x in self.dataset_groups if x.subset(**kwargs)]
    948 
    949     def project_celltypes_to_ontology(self, adata_fields: Union[AdataIds, None] = None, copy=False):

/lustre/groups/ml01/code/leander.dony/sfaira/sfaira/data/dataloaders/base/dataset_group.py in <listcomp>(.0)
    945 
    946     def subset(self, **kwargs):
--> 947         self.dataset_groups = [x for x in self.dataset_groups if x.subset(**kwargs)]
    948 
    949     def project_celltypes_to_ontology(self, adata_fields: Union[AdataIds, None] = None, copy=False):

/lustre/groups/ml01/code/leander.dony/sfaira/sfaira/data/dataloaders/base/dataset_group.py in subset(self, key, values, **kwargs)
    442         keep = False
    443         for x in self.ids:
--> 444             if hasattr(self, key) and getattr(self, key) is not None:
    445                 # Collection-level annotation.
    446                 annot = getattr(self, key)

/lustre/groups/ml01/code/leander.dony/sfaira/sfaira/data/dataloaders/base/dataset_group.py in doi(self)
    490             if isinstance(vdoi, str):
    491                 vdoi = [vdoi]
--> 492             dois.extend(vdoi)
    493         return np.unique(dois).tolist()
    494 

TypeError: 'NoneType' object is not iterable
davidsebfischer commented 1 year ago

You could try subsetting by "doi_journal" as a fix, that should work. The bug is here

DatasetGroup::def doi(self) -> List[str]:
        dois = []
        for _, v in self.datasets.items():
            vdoi = v.doi_journal
            if isinstance(vdoi, str):
                vdoi = [vdoi]
            dois.extend(vdoi)
        return np.unique(dois).tolist()

which does not account for the case in which only doi_preprint is annotated and doi_journal is None, this probably broke during a past refactor of the DOI interface:

DatasetGroup::def doi(self) -> List[str]:
        dois = []
        for _, v in self.datasets.items():
            vdoi_j = v.doi
            dois.extend(vdoi)
        return np.unique(dois).tolist()