vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.31k stars 793 forks source link

Support `alt.datum[signal_name]` in addition to `"datum[signal_name]"` #3366

Open joelostblom opened 7 months ago

joelostblom commented 7 months ago

Currently using alt.datum in altair does not support passing a signal/parameter whereas writing the expression string directly as JS-like string does (an example of dynamically passing fields via parameters here https://github.com/altair-viz/altair/pull/3357/files (both the chart title and calculate transform).

Here is another example:

import altair as alt
from vega_datasets import data

dropdown = alt.binding_select(
    options=['Horsepower', 'Displacement', 'Weight_in_lbs', 'Acceleration'],
    name='X-axis column '
)
xcol_param = alt.param(
    value='Horsepower',
    bind=dropdown
)

alt.Chart(data.cars.url).mark_circle().encode(
    x=alt.X('x:Q').title(''),
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).transform_calculate(
    x=f'datum[{xcol_param}]'
    # x=alt.datum[xcol_param.name]
).add_params(
    xcol_param
)

The reason it fails is that alt.datum[xcol_param.name] inserts additional quotation around the parameter name in the VL spec:

"transform": [{"calculate": "datum['param_1']", "as": "x"}],

whereas the VL spec for f'datum[{xcol_param}]' does not:

"transform": [{"calculate": "datum[param_1]", "as": "x"}],

Open the Chart in the Vega Editor

Personally I think the cleanest would be if we could support just using the variable name without the .name suffix when using alt.datum, as in this example from the docs:

param_color_js_expr = alt.param(expr=f"{param_width.name} < 200 ? 'red' : 'black'")
param_color_py_expr = alt.param(expr=alt.expr.if_(param_width < 200, 'red', 'black'))

But even if that doesn't work, just supporting the use of the .name suffix without the quotes inserted would be great. Here are a few examples of how these datums and expressions get translated with differential quoting:

image

jonmmease commented 7 months ago

Thanks for writing this up, I think alt.datum[xcol_param] rather than alt.datum[xcol_param.name] makes good sense

dangotbanned commented 3 weeks ago

@joelostblom I've seen you mention this issue a few times.

Directly fixing alt.datum[Parameter] escaping would only require a small change like:

diff altair/expr/core.py

I would replace the `hasattr()`(s) with a [`@runtime_checkable` protocol](https://typing.readthedocs.io/en/latest/spec/protocol.html#runtime-checkable-decorator-and-narrowing-types-by-isinstance) - just trying to keep this diff simple: ```diff diff --git a/altair/expr/core.py b/altair/expr/core.py index e7872ada..aa0b4d25 100644 --- a/altair/expr/core.py +++ b/altair/expr/core.py @@ -18,7 +18,11 @@ class DatumType: return GetAttrExpression("datum", attr) def __getitem__(self, attr) -> GetItemExpression: - return GetItemExpression("datum", attr) + if hasattr(attr, "param_type") and hasattr(attr, "name"): + expr = GetItemExpression("datum", attr.name, escape=False) + else: + expr = GetItemExpression("datum", attr) + return expr def __call__(self, datum, **kwargs) -> dict[str, Any]: """Specify a datum for use in an encoding.""" @@ -230,11 +234,12 @@ class GetAttrExpression(Expression): class GetItemExpression(Expression): - def __init__(self, group, name) -> None: - super().__init__(group=group, name=name) + def __init__(self, group, name, *, escape: bool = True) -> None: + super().__init__(group=group, name=name, escape=escape) def __repr__(self) -> str: - return f"{self.group}[{self.name!r}]" + r = self.escape + return f"{self.group}[{repr(self.name) if r else self.name}]" ```

My main question is what does alt.datum[(Parameter|Parameter.name)] represent?

I can test out things for myself, but it isn't immediately obvious to me what the difference is between:

Is there any documentation on this I've missed?


I think adding something like Parameter.to_datum() might be a clear way to represent this.

Maybe there's a more fitting name, but your first example would then become:

Code block

https://github.com/vega/altair/issues/3366#issue-2189513304 ```py import altair as alt from vega_datasets import data dropdown = alt.binding_select( options=['Horsepower', 'Displacement', 'Weight_in_lbs', 'Acceleration'], name='X-axis column ' ) xcol_param = alt.param( value='Horsepower', bind=dropdown ) alt.Chart(data.cars.url).mark_circle().encode( x=alt.X('x:Q').title(''), y='Miles_per_Gallon:Q', color='Origin:N' ).transform_calculate( x=xcol_param.to_datum() # x=f'datum[{xcol_param}]' # x=alt.datum[xcol_param.name] ).add_params( xcol_param ) ```

We'd also be able to add some examples in the docstring of where this should be used, along the lines of our discussion in https://github.com/vega/altair/issues/3500#issuecomment-2313271440

jonmmease commented 3 weeks ago

what does alt.datum[(Parameter|Parameter.name)] represent?

The idea is that the parameter holds a column name string. So alt.datum[param] would be equivalent to alt.datum["colA"] when that parameter's value is set to "colA". This way the column can be chosen dynamically when that parameter is bound to a widget. It's not the parameter becoming a datum, so I don't personally think Parameter.to_datum() would be a good fit.

dangotbanned commented 3 weeks ago

It's not the parameter becoming a datum, so I don't personally think Parameter.to_datum() would be a good fit.

That's fair @jonmmease!

I've had another read through these, but I'm having a hard time pinning down a more accurate name for this op

Very open to any other suggestions 😄

joelostblom commented 3 weeks ago

Thanks for looking into this @dangotbanned! Do we need a name for this operation or can we just support alt.datum[param] rather than Parameter.method_name()? This is the closest to the existing Vega-Lite JS syntax and like Jon mentioned alt.datum['column_name] can be used to reference any column in the data, so it seems intuitive to me that having a param that stores a string value can be passed instead of a string directly.

dangotbanned commented 3 weeks ago

Thanks for looking into this @dangotbanned! Do we need a name for this operation or can we just support alt.datum[param] rather than Parameter.method_name()? This is the closest to the existing Vega-Lite JS syntax and like Jon mentioned alt.datum['column_name] can be used to reference any column in the data, so it seems intuitive to me that having a param that stores a string value can be passed instead of a string directly.

Yeah we can just support alt.datum[Parameter] without the additional method @joelostblom.

I would say the general aim I had there was to try and improve discoverability of this feature. If we can do that without a method I'm happy with a different route.

Alternate proposal

We could make the following changes to update alt.datum in line with alt.expr


Then we include datum in the API Reference under the same section as expr:

https://github.com/vega/altair/blob/17968ea95489cc88de647381d3d1974b49dab88e/tools/generate_api_docs.py#L128-L130

Benefits