vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.4k stars 795 forks source link

refactor: Reduce `SchemaValidationError` traceback length #3530

Closed dangotbanned closed 2 months ago

dangotbanned commented 3 months ago

This PR was prompted by difficulty in reading the traceback of failures in #3501.

Description

There has clearly been a great deal of care put into using schemapi.SchemaValidationError to produce a more useful traceback.

An element that I believe was overlooked, was that the surrounding code also contributes to the traceback length.

Before/After

The exact traceback seems to vary by the python interpreter used. ipython could be considered the best-case, with pytest the worst-case.

Anything else I imagine would fall somewhere between the length of each.

ipython

Reduction of roughly 10 lines Importantly, long comments that are not relevant to the user are no longer displayed.

Before ![image](https://github.com/user-attachments/assets/a816862a-a05a-4796-a5fd-295cb65a819c)
After ![image](https://github.com/user-attachments/assets/5987586e-fc59-4f82-bbc1-2aec6b79c5d4)

pytest

Reduction of roughly 60 lines

Before ```py (doc) PS C:\Users\danie\Documents\GitHub\altair> pytest -k test_to_dict_huge_traceback --numprocesses=1 ===================================================================================== test session starts ===================================================================================== platform win32 -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0 rootdir: C:\Users\danie\Documents\GitHub\altair configfile: pyproject.toml plugins: anyio-4.4.0, cov-5.0.0, xdist-3.6.1 1 worker [1 item] F [100%] ========================================================================================== FAILURES =========================================================================================== _________________________________________________________________________________ test_to_dict_huge_traceback _________________________________________________________________________________ [gw0] win32 -- Python 3.12.3 C:\Users\danie\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\doc\Scripts\python.exe def test_to_dict_huge_traceback(): # Error: Invalid value for type > return alt.Chart().encode(alt.X(type="unknown")).to_dict() tests\utils\test_schemapi.py:501: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ altair\vegalite\v5\api.py:3771: in to_dict return super(Chart, copy).to_dict(**kwds) altair\vegalite\v5\api.py:1807: in to_dict vegalite_spec: Any = super(TopLevelMixin, copy).to_dict( # type: ignore[misc] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = alt.Chart(...), validate = True def to_dict( self, validate: bool = True, *, ignore: list[str] | None = None, context: dict[str, Any] | None = None, ) -> dict[str, Any]: """ Return a dictionary representation of the object. Parameters ---------- validate : bool, optional If True (default), then validate the output dictionary against the schema. ignore : list[str], optional A list of keys to ignore. It is usually not needed to specify this argument as a user. context : dict[str, Any], optional A context dictionary. It is usually not needed to specify this argument as a user. Notes ----- Technical: The ignore parameter will *not* be passed to child to_dict function calls. Returns ------- dict The dictionary representation of this object Raises ------ SchemaValidationError : if validate=True and the dict does not conform to the schema """ if context is None: context = {} if ignore is None: ignore = [] # The following return the package only if it has already been # imported - otherwise they return None. This is useful for # isinstance checks - for example, if pandas has not been imported, # then an object is definitely not a `pandas.Timestamp`. pd_opt = sys.modules.get("pandas") np_opt = sys.modules.get("numpy") if self._args and not self._kwds: result = _todict( self._args[0], context=context, np_opt=np_opt, pd_opt=pd_opt ) elif not self._args: kwds = self._kwds.copy() # parsed_shorthand is added by FieldChannelMixin. # It's used below to replace shorthand with its long form equivalent # parsed_shorthand is removed from context if it exists so that it is # not passed to child to_dict function calls parsed_shorthand = context.pop("parsed_shorthand", {}) # Prevent that pandas categorical data is automatically sorted # when a non-ordinal data type is specifed manually # or if the encoding channel does not support sorting if "sort" in parsed_shorthand and ( "sort" not in kwds or kwds["type"] not in {"ordinal", Undefined} ): parsed_shorthand.pop("sort") kwds.update( { k: v for k, v in parsed_shorthand.items() if kwds.get(k, Undefined) is Undefined } ) kwds = { k: v for k, v in kwds.items() if k not in {*list(ignore), "shorthand"} } if "mark" in kwds and isinstance(kwds["mark"], str): kwds["mark"] = {"type": kwds["mark"]} result = _todict(kwds, context=context, np_opt=np_opt, pd_opt=pd_opt) else: msg = ( f"{self.__class__} instance has both a value and properties : " "cannot serialize to dict" ) raise ValueError(msg) if validate: try: self.validate(result) except jsonschema.ValidationError as err: # We do not raise `from err` as else the resulting # traceback is very long as it contains part # of the Vega-Lite schema. It would also first # show the less helpful ValidationError instead of # the more user friendly SchemaValidationError > raise SchemaValidationError(self, err) from None E altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson']. altair\utils\schemapi.py:1065: SchemaValidationError =================================================================================== short test summary info =================================================================================== FAILED tests/utils/test_schemapi.py::test_to_dict_huge_traceback - altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson']. ====================================================================================== 1 failed in 9.41s ====================================================================================== ```
After ```py (doc) PS C:\Users\danie\Documents\GitHub\altair> pytest -k test_to_dict_huge_traceback --numprocesses=1 ===================================================================================== test session starts ====================================================================================== platform win32 -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0 rootdir: C:\Users\danie\Documents\GitHub\altair configfile: pyproject.toml plugins: anyio-4.4.0, cov-5.0.0, xdist-3.6.1 1 worker [1 item] F [100%] =========================================================================================== FAILURES =========================================================================================== _________________________________________________________________________________ test_to_dict_huge_traceback __________________________________________________________________________________ [gw0] win32 -- Python 3.12.3 C:\Users\danie\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\doc\Scripts\python.exe def test_to_dict_huge_traceback(): # Error: Invalid value for type > return alt.Chart().encode(alt.X(type="unknown")).to_dict() tests\utils\test_schemapi.py:501: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ altair\vegalite\v5\api.py:3778: in to_dict return super(Chart, copy).to_dict(**kwds) altair\vegalite\v5\api.py:1818: in to_dict vegalite_spec: Any = _top_schema_base(super(TopLevelMixin, copy)).to_dict( _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = alt.Chart(...), validate = True def to_dict( self, validate: bool = True, *, ignore: list[str] | None = None, context: dict[str, Any] | None = None, ) -> dict[str, Any]: """ Return a dictionary representation of the object. Parameters ---------- validate : bool, optional If True (default), then validate the result against the schema. ignore : list[str], optional A list of keys to ignore. context : dict[str, Any], optional A context dictionary. Raises ------ SchemaValidationError : If ``validate`` and the result does not conform to the schema. Notes ----- - ``ignore``, ``context`` are usually not needed to be specified as a user. - *Technical*: ``ignore`` will **not** be passed to child :meth:`.to_dict()`. """ context = context or {} ignore = ignore or [] opts = _get_optional_modules(np_opt="numpy", pd_opt="pandas") if self._args and not self._kwds: kwds = self._args[0] elif not self._args: kwds = self._kwds.copy() exclude = {*ignore, "shorthand"} if parsed := context.pop("parsed_shorthand", None): kwds = _replace_parsed_shorthand(parsed, kwds) kwds = {k: v for k, v in kwds.items() if k not in exclude} if (mark := kwds.get("mark")) and isinstance(mark, str): kwds["mark"] = {"type": mark} else: msg = f"{type(self)} instance has both a value and properties : cannot serialize to dict" raise ValueError(msg) result = _todict(kwds, context=context, **opts) if validate: try: self.validate(result) except jsonschema.ValidationError as err: > raise SchemaValidationError(self, err) from None E altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson']. altair\utils\schemapi.py:1036: SchemaValidationError =================================================================================== short test summary info ==================================================================================== FAILED tests/utils/test_schemapi.py::test_to_dict_huge_traceback - altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson']. ====================================================================================== 1 failed in 9.84s ======================================================================================= ```

Notes for reviewers

I was very careful in slowly introducing these changes & ensuring each commit passed the test suite. Therefore, the commit messages contain a lot of information which explain each decision.

A general theme was to move long comments into docstrings, and reduce lines & characters where possible - to reduce noise.

dangotbanned commented 2 months ago

Apologies for all the review requests.

I'm hoping to merge this ahead of https://github.com/vega/altair/pull/3547, which will have a lot of conflicts in schemapi.py.

Ideally we could merge this, I can resolve conflicts then tidy #3547 up and move that out of draft

joelostblom commented 2 months ago

I started reviewing this yesterday and should be able to finish today. It would be good if one more person has at least a quick glance at it.

dangotbanned commented 2 months ago

I started reviewing this yesterday and should be able to finish today. It would be good if one more person has at least a quick glance at it.

Thank you @joelostblom

dangotbanned commented 2 months ago

Thanks for the thought and work and you put into this @dangotbanned ! A special shout out to your commits messages, which are always so detailed and explanatory; I have found this really helpful when reviewing. In fact, you have inspired me to write more elaborate commit messages myself! (although I wish github's PR review UI had an easy way to see the messages on a line per line bases when in the files view, and not only when going commit by commit...)

Overall, I think the shorter error messages you introduce here add utility, and I believe you have largely avoided that this comes at the expense of code readability, which is important and something we want to be careful with. I have a few comments

Really appreciate the compliment and review @joelostblom!

Yeah it is a shame the messages aren't always easy to find. I used them a lot here for keeping notes on all the small changes, but I'm starting to think using a conversation thread might make for an easier review.

Hopefully https://github.com/vega/altair/pull/3530#commits-pushed-b42364b addresses your comments.

and I would like someone else to have at least a quick look before merging, in case there is something I'm missing here.

I agree on having another set of eyes.

3547 includes a lot of similar changes - so consensus here would make for fewer surprises over there

dangotbanned commented 2 months ago

LGTM now, I'm approving but let's get another approve before merging.

Thanks again @joelostblom

Just disabled auto-merge to support this https://github.com/vega/altair/pull/3530#event-14085754901