laminlabs / bionty

Registries for biological entities, coupled to public ontologies.
Apache License 2.0
11 stars 3 forks source link

Reduce specify organism error message #142

Closed Zethson closed 4 weeks ago

Zethson commented 1 month ago

Description of feature

Currently super long:

{
    "name": "AssertionError",
    "message": "Gene requires to specify a organism name via `organism=` or `bionty.settings.organism=`!",
    "stack": "---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[5], line 1
----> 1 bt.Gene.validate(df[\"Gene Symbol\"], field=bt.Gene.symbol)

File ~/PycharmProjects/lamindb/lamindb/_can_validate.py:86, in validate(cls, values, field, mute, organism, source)
     74 @classmethod  # type: ignore
     75 @doc_args(CanValidate.validate.__doc__)
     76 def validate(
   (...)
     83     source: Record | None = None,
     84 ) -> np.ndarray:
     85     \"\"\"{}\"\"\"  # noqa: D415
---> 86     return _validate(
     87         cls=cls, values=values, field=field, mute=mute, organism=organism, source=source
     88     )

File ~/PycharmProjects/lamindb/lamindb/_can_validate.py:240, in _validate(cls, values, field, mute, using_key, organism, source)
    237     queryset = queryset.filter(source=source).all()
    238 _check_organism_db(organism, using_key)
    239 field_values = pd.Series(
--> 240     _filter_query_based_on_organism(
    241         queryset=queryset,
    242         field=field,
    243         organism=organism,
    244         values_list_field=field,
    245     ),
    246     dtype=\"object\",
    247 )
    248 if field_values.empty:
    249     if not mute:

File ~/PycharmProjects/lamindb/lamindb/_can_validate.py:575, in _filter_query_based_on_organism(queryset, field, organism, values_list_field, fields)
    571 if _has_organism_field(registry) and not _field_is_id(field, registry):
    572     # here, we can safely import bionty
    573     from bionty._bionty import create_or_get_organism_record
--> 575     organism_record = create_or_get_organism_record(
    576         organism=organism, registry=registry, field=field
    577     )
    578     if organism_record is not None:
    579         queryset = queryset.filter(organism__name=organism_record.name)

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/_bionty.py:57, in create_or_get_organism_record(organism, registry, field)
     52         if hasattr(registry, \"_ontology_id_field\") and field in {
     53             registry._ontology_id_field,
     54             \"uid\",
     55         }:
     56             return None
---> 57         raise AssertionError(
     58             f\"{registry.__name__} requires to specify a organism name via `organism=` or `bionty.settings.organism=`!\"
     59         )
     61 return organism_record

AssertionError: Gene requires to specify a organism name via `organism=` or `bionty.settings.organism=`!"
}

users will run into this one frequently because it's easy to forget. It is a pretty isolated error so maybe we even want to catch it and only display the final error and suggested resolution? Everything else looks scary...

falexwolf commented 4 weeks ago

Why not raise a dedicated error that inherits from SystemExit like we do for many other errors (e.g. ValidationError)?

https://github.com/laminlabs/lamindb/blob/b7b422365841a95e1fda24bd7b79ac826cde3582/lamindb/core/exceptions.py#L41