theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
134 stars 11 forks source link

Ontology bug in Universe.subset(key="organism", values=["human"]) #487

Open TheAustinator opened 2 years ago

TheAustinator commented 2 years ago

Thanks for all the hard work on this package -- a data/model zoo was a big hole in the single-cell field.

The issue -- following tutorial notebook:

ds = sfaira.data.Universe(data_path=datadir, meta_path=metadir, cache_path=cachedir)
ds.subset(key="organism", values=["human"])

Hits the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_1430/3652047400.py in <module>
      9 
     10 ds = sfaira.data.Universe(data_path=datadir, meta_path=metadir, cache_path=cachedir)
---> 11 ds.subset(key="organism", values=["human"])
     12 ds.subset(key="organ", values=["eye"])
     13 ds  # Subset to all human datasets

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/dataset_group.py in subset(self, key, values)
   1158         """
   1159         for x in self.dataset_groups:
-> 1160             x.subset(key=key, values=values)
   1161         self.dataset_groups = [x for x in self.dataset_groups if x.datasets]  # Delete empty DatasetGroups
   1162 

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/dataset_group.py in subset(self, key, values)
    564                 if not isinstance(values_found, list):
    565                     values_found = [values_found]
--> 566                 if not np.any([
    567                     np.any([
    568                         is_child(query=y, ontology=ontology, ontology_parent=z)

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/dataset_group.py in <listcomp>(.0)
    565                     values_found = [values_found]
    566                 if not np.any([
--> 567                     np.any([
    568                         is_child(query=y, ontology=ontology, ontology_parent=z)
    569                         for z in values

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/dataset_group.py in <listcomp>(.0)
    566                 if not np.any([
    567                     np.any([
--> 568                         is_child(query=y, ontology=ontology, ontology_parent=z)
    569                         for z in values
    570                     ]) for y in values_found

/usr/local/lib/python3.8/dist-packages/sfaira/data/dataloaders/base/utils.py in is_child(query, ontology, ontology_parent)
     27                 return ontology.is_node(query)
     28             else:
---> 29                 return ontology.is_a(query=query, reference=ontology_parent)
     30         elif ontology is None:
     31             return query == ontology_parent

/usr/local/lib/python3.8/dist-packages/sfaira/versions/metadata/base.py in is_a(self, query, reference, convert_to_id)
    376         if convert_to_id:
    377             query = self.__convert_to_id_cached(query)
--> 378             reference = self.__convert_to_id_cached(reference)
    379         return query in self.get_ancestors(node=reference) or query == reference
    380 

/usr/local/lib/python3.8/dist-packages/sfaira/versions/metadata/base.py in __convert_to_id_cached(self, x)
    305     @lru_cache(maxsize=None)
    306     def __convert_to_id_cached(self, x: str) -> str:
--> 307         return self.convert_to_id(x)
    308 
    309     @property

/usr/local/lib/python3.8/dist-packages/sfaira/versions/metadata/base.py in convert_to_id(self, x)
    296             ]
    297         else:
--> 298             raise ValueError(f"node {x[0]} not recognized")
    299         self.__validate_node_ids(x=x)
    300         if was_str:

ValueError: node human not recognized

Universe.subset is working fine on key="organ", so I've gotten around this by just skipping the organism subset.

davidsebfischer commented 2 years ago

Thanks, @TheAustinator! We updated organisms to follow NCBItaxon, so you need to use "Homo sapiens" here, we will update this in the tutorials!

TheAustinator commented 2 years ago

Ah, I should have thought to try that. Thanks!

TheAustinator commented 2 years ago

It might also help to list the options when you hit an error, if that's easy with this ontology structure