vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.33k stars 794 forks source link

Improve (`Expression`|`expr`|`Expr`|`ExprRef`) UX & ergonomics #3616

Open mattijn opened 1 month ago

mattijn commented 1 month ago

What is your suggestion?

As requested by @dangotbanned here: https://github.com/vega/altair/pull/3614#pullrequestreview-2333267989. https://github.com/vega/altair/pull/3614 uses an expression as a str with plain Vega Expression syntax. It would be nice to use the equivalent python syntax instead.

Have you considered any alternative solutions?

No response

dangotbanned commented 1 month ago

Thanks for opening this @mattijn

The reason I wanted an issue was to investigate why the definitions marked as # BUG were not working.

Code block from review ```py import altair as alt from vega_datasets import data source = data.barley() base = alt.Chart(source).encode( x=alt.X("sum(yield):Q").stack("zero"), y=alt.Y("site:O").sort("-x"), text=alt.Text("sum(yield):Q", format=".0f"), ) # NOTE: Original `str` expressions # tooltip = alt.expr("luminance(scale('color', datum.sum_yield))") # color = alt.expr("luminance(scale('color', datum.sum_yield)) > 0.5 ? 'black' : 'white'") luminance = alt.expr.luminance(alt.expr.scale("color", alt.datum.sum_yield)) # BUG: These definitions alone *should* work, but don't evaluate as expressions tooltip = luminance color = alt.expr.if_(luminance > 0.5, "black", "white") # HACK: Actual additional steps required tooltip = alt.expr(repr(luminance)) color = alt.expr(repr(color)) bars = base.mark_bar(tooltip=tooltip).encode(color="sum(yield):Q") text = base.mark_text(align="right", dx=-3, color=color) bars + text ```

I'm now seeing that this is the distinction between Expr (str) and ExprRef ({"expr": str}).

I think handling this implicitly could be error prone.


A simple alternative could be adding Expression.to_expr:

Diff altair/expr/core.py

```diff diff --git a/altair/expr/core.py b/altair/expr/core.py index e7872ada..095385b4 100644 --- a/altair/expr/core.py +++ b/altair/expr/core.py @@ -168,21 +168,24 @@ class OperatorMixin: class Expression(OperatorMixin, SchemaBase): """ Expression. Base object for enabling build-up of Javascript expressions using a Python syntax. Calling ``repr(obj)`` will return a Javascript representation of the object and the operations it encodes. """ _schema = {"type": "string"} def to_dict(self, *args, **kwargs): return repr(self) + def to_expr(self): + return {"expr": repr(self)} + def __setattr__(self, attr, val) -> None: # We don't need the setattr magic defined in SchemaBase return object.__setattr__(self, attr, val) # item access def __getitem__(self, val): return GetItemExpression(self, val) ```

This would then allow the following syntax:

import altair as alt
from vega_datasets import data

source = data.barley()

base = alt.Chart(source).encode(
    x=alt.X("sum(yield):Q").stack("zero"),
    y=alt.Y("site:O").sort("-x"),
    text=alt.Text("sum(yield):Q", format=".0f"),
)

# NOTE: Original `str` expressions
# tooltip = alt.expr("luminance(scale('color', datum.sum_yield))")
# color = alt.expr("luminance(scale('color', datum.sum_yield)) > 0.5 ? 'black' : 'white'")

luminance = alt.expr.luminance(alt.expr.scale("color", alt.datum.sum_yield))

# NOTE: More ergonomic? # <-------------------------------------------------
tooltip = luminance.to_expr()
color = alt.expr.if_(luminance > 0.5, "black", "white").to_expr()

bars = base.mark_bar(tooltip=tooltip).encode(color="sum(yield):Q")
text = base.mark_text(align="right", dx=-3, color=color)

bars + text

It would only be needed as the last operation, so it makes sense to be appearing as the final method call.

We could instead use (expr|ExprRef), this was just the quick and easy solution I saw

mattijn commented 1 month ago

We have had .ref(), but that was not necessary anymore at one moment, probably because we solved it implicitly. I think there is also .expr. I used it once in this example. I wish we are clever enough to know when it should be an expression as string or an expression as reference. It's near impossible to know what should be used as an Altair user without using debug tools such as the Vega editor.

dangotbanned commented 1 month ago

We have had .ref(), but that was not necessary anymore at one moment, probably because we solved it implicitly. I think there is also .expr. I used it once in this example. I wish we are clever enough to know when it should be an expression as string or an expression as reference. It's near impossible to know what should be used as an Altair user without using debug tools such as the Vega editor.

Do you mean Parameter.ref() @mattijn?

https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/api.py#L355-L389

My suggestion was for Expression (and subclasses), but not OperatorMixin (which Parameter derives)

mattijn commented 1 month ago

Ah yeah, that was for parameter. To speak for myself, I was happy that we could deprecate that one, since I was not able to explain when and when not to use it. The same feeling I have when we introduce an expression.to_expr()..🤔

dangotbanned commented 1 month ago

Ah yeah, that was for parameter. To speak for myself, I was happy that we could deprecate that one, since I was not able to explain when and when not to use it. The same feeling I have when we introduce an expression.to_expr()..🤔

@mattijn Well that part would be solved by communicating this in annotations 😃

It would also be quite similar to the recent changes with conditions in #3567

api.py

https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/api.py#L628-L706

A lot of this is about identifying either a dict with a key "condition" or an object with an attribute object.condition.

The same idea would apply for this case, but switching to "expr" or object.expr.

Since we know precisely what every schema will permit; I could look into how we can represent this scenario as part of SchemaInfo.to_type_repr()?

https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/tools/schemapi/utils.py#L455-L479

mattijn commented 1 month ago

It would be quite an achievement if you manage, but you might be right that it is actually possible.

dangotbanned commented 1 month ago

@mattijn I'll keep this comment updated with all the context I can dig up.

Related

tools.schemapi.utils.title_to_type_reprs

Had another look and I see this **TODO** ([line 555](https://github.com/vega/altair/blame/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/tools/schemapi/utils.py#L555)) I added in (#3536) has some relation https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/tools/schemapi/utils.py#L552-L570

Relevant section of vega-lite-schema.json

https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/schema/vega-lite-schema.json#L8808-L8823 ```json { "definitions": { "Expr": { "type": "string" }, "ExprRef": { "additionalProperties": false, "properties": { "expr": { "description": "Vega expression (which can refer to Vega-Lite parameters).", "type": "string" } }, "required": [ "expr" ], "type": "object" } } } ```

Most of altair.expr.core.py

https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/expr/core.py#L31-L240

Some inconsistency in v5.api.param

Highlighting this since it is the last step for the *preferred* functional wrappers https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/api.py#L1316-L1321 `core.VariableParameter.expr` **exists** https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/api.py#L1381-L1387 `core.TopLevelSelectionParameter.expr` **doesn't** exist https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/api.py#L1390-L1392 `core.SelectionParameter.expr` **doesn't** exist https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/vegalite/v5/api.py#L1395-L1397 # Question Does this mean a defining trait of a `VariableParameter` is that it wraps an expression?

Usage

Issues/PRs

Summary

dangotbanned commented 1 month ago

@mattijn would you be okay if we renamed this issue?

Modifying the code used in #3614 example would be a benefit. However, I'm thinking more along the lines of https://github.com/vega/altair/issues/3563#issuecomment-2333887397 to provide a more ergonomic alternative.

My proposal of Expression.to_expr() is an alternative, but based on https://github.com/vega/altair/issues/3616#issuecomment-2380614092 I'd like to explore if we could benefit from a larger change

mattijn commented 1 month ago

Sure, issue can be renamed!

dangotbanned commented 1 month ago

I've started exploring some of https://github.com/vega/altair/issues/3616#issuecomment-2380614092

What I found really surprised me.

Testing

I made this small change, which simply adds Expr as an annotation when used in a schema. This is to try to answer @mattijn's point in https://github.com/vega/altair/issues/3616#issuecomment-2379947174 about knowing when to use a string

diff tools/schemapi/utils.py ```diff diff --git a/tools/schemapi/utils.py b/tools/schemapi/utils.py index 5c4a84f9..f3706ba9 100644 --- a/tools/schemapi/utils.py +++ b/tools/schemapi/utils.py @@ -568,6 +568,8 @@ class SchemaInfo: # NOTE: To keep type hints simple, we annotate with `SchemaBase` for all subclasses. if title in tp_param: tps.add("Parameter") + if title == "Expr": + tps.add(title) elif self.is_value(): value = self.properties["value"] t = value.to_type_repr(target="annotation", use_concrete=use_concrete) ```

Running schema generation resulted in only 3 changes?

diff altair/vegalite/v5/schema/core.py

```diff diff --git a/altair/vegalite/v5/schema/core.py b/altair/vegalite/v5/schema/core.py index 623b27ee..1ff7c660 100644 --- a/altair/vegalite/v5/schema/core.py +++ b/altair/vegalite/v5/schema/core.py @@ -21921,7 +21921,9 @@ class DerivedStream(Stream): between: Optional[Sequence[SchemaBase | Map]] = Undefined, consume: Optional[bool] = Undefined, debounce: Optional[float] = Undefined, - filter: Optional[str | SchemaBase | Sequence[str | SchemaBase]] = Undefined, + filter: Optional[ + str | Expr | SchemaBase | Sequence[str | Expr | SchemaBase] + ] = Undefined, markname: Optional[str] = Undefined, marktype: Optional[SchemaBase | MarkType_T] = Undefined, throttle: Optional[float] = Undefined, @@ -21981,7 +21983,9 @@ class MergedStream(Stream): between: Optional[Sequence[SchemaBase | Map]] = Undefined, consume: Optional[bool] = Undefined, debounce: Optional[float] = Undefined, - filter: Optional[str | SchemaBase | Sequence[str | SchemaBase]] = Undefined, + filter: Optional[ + str | Expr | SchemaBase | Sequence[str | Expr | SchemaBase] + ] = Undefined, markname: Optional[str] = Undefined, marktype: Optional[SchemaBase | MarkType_T] = Undefined, throttle: Optional[float] = Undefined, @@ -26715,7 +26719,7 @@ class VariableParameter(TopLevelParameter): self, name: Optional[str | SchemaBase] = Undefined, bind: Optional[SchemaBase | Map] = Undefined, - expr: Optional[str | SchemaBase] = Undefined, + expr: Optional[str | Expr | SchemaBase] = Undefined, react: Optional[bool] = Undefined, value: Optional[Any] = Undefined, **kwds, ```

Making the same change, but for ExprRef results in 5335 changes

Screenshot ![image](https://github.com/user-attachments/assets/1862ea53-cbf4-4781-a0f0-0eea2362add0)

Conclusion

I can't help but think that Expression should be abstracting ExprRef instead of Expr.

This would mean these methods no longer both produce str


import altair as alt
from altair.expr.core import FunctionExpression

example: FunctionExpression = alt.expr.random()

>>> example.to_dict() == example._to_expr()
True

Here OperatorMixin.to_dict() returns a str, but could return {"expr": "..."} if it subclassed ExprRef instead.

It would still make sense for OperatorMixin._to_expr() to return a str - and that maps more closely to Expr.


We'd then have the same type returned from all of:

I think this is conceptually quite clean and easy to explain. expr is a factory to produce expr objects (expressions).

Since that would account for 5335 annotations, we could use expr in them directly. That would also solve minor typing issue in expr.__new__

https://github.com/vega/altair/blob/cabf1e6428bab107a26adc6c51993d3b2c5bf6f0/altair/expr/__init__.py#L111-L117