Closed palewire closed 9 months ago
In teaching Altair, this question comes up a lot. I like your solution and I think it is consistent with the existing shorthand.
On Tue, May 22, 2018 at 7:19 AM, Ben Welsh notifications@github.com wrote:
Yesterday, a colleague asked me how to dictate the sort of bars in a chart. I developed this example https://github.com/datadesk/altair-column-sort-example/blob/master/notebook.ipynb to show him how.
[image: download] https://user-images.githubusercontent.com/9993/40367924-5f658f52-5d8f-11e8-94b2-6a46af67b80c.png
alt.Chart(df, title="Median household income of U.S. counties").mark_bar().encode( x=alt.X( "name:N", axis=alt.Axis(labels=False, title="", ticks=False),
Here's where you can resort the order of the columns on the x-axis
sort=alt.SortField( # This SortField class requires at least three inputs, # which does seem like overkill. I'd like to see a simpler # way to pull this off. field='b19013001', # First the field you want to sort on op='sum', # Then the operation to run on that field. In this case, we just total the value. order="descending" # Finally, the order to sort. ) ), y=alt.Y( "b19013001:Q", axis=alt.Axis(title="", format="$s", ticks=False) )
).properties(width=620)
It works great but, IMHO, the SortField requirement with three inputs, including a "fake" op that in this case does not appear to be necessary, is asking a lot of beginners. And I'd like to think something more convenient could also benefit experts.
I know nothing about the internals of this feature, but I'm curious if the sort channel could somehow benefit from a shorthand, much like the x and y channels.
In my imagination, something like this:
sort=alt.SortField(field="b19013001", op="sum", ordering="descending")
Could be submitted like this, with the field and operation handled much like the other shorthand features, and the descending order of the sort handled with the same style as the order_by https://docs.djangoproject.com/en/2.0/ref/models/querysets/#order-by method of the popular Django framework:
sort="-sum(b19013001)"
I'm guessing you can easily imagine the other permutations in this kind of scheme. Additionally in cases where the dataframe is not grouped during encoding, it seems to me that providing the op argument should be, 🥁, optional. That would mean that if a field was to be used as the sort in ascending order with no aggregation, the shorthand submission could be as simple as:
sort="b19013001"
What do you think? If something like this already exist and I'm simply ignorant of it I will accept writing the documentation as my punishment.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/issues/884, or mute the thread https://github.com/notifications/unsubscribe-auth/AABr0MNiXDB9g9vJXN2Pai4T2vKaxG_Yks5t1B5wgaJpZM4UIvf_ .
-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com
As the aforementioned colleague, I'll start by admitting that I'm an Altair newb. I still haven't wrapped my head around the values/function of the op
parameter. As I told @palewire , I'm having trouble intuiting why I would choose 'sum'.
That said, the syntax Ben suggests seems clear and concise, especially if the op
becomes optional in common/simple cases.
Thanks for bringing this up... I agree that the grammar is a bit complicated in this case. Perhaps it would make sense to raise an issue in Vega-Lite and recommend that op
be made an optional argument?
Regarding adding new shorthand parsing... I'm a bit wary of that, because every extra piece of logic that we add on top of the schema is one more thing that can (and will) break during a future vega-lite update. Do you think that making op
optional within alt.SortField
would do enough to clarify things for users?
We thought about this before, but it is unclear what's a reasonable default. If you think this should be done, feel free to discuss more in https://github.com/vega/vega-lite/issues/1489.
I respect your reticence to venture too far away from Vega, but I'm curious how you properly judge the different opportunities to introduce shorthand.
My novice understand of Altair leads to me to believe there are some cases where this has been done as a convenience to users. Is there a list of them anywhere?
Currently the only place such shorthands have been introduced is in the encode()
method, and in a couple of the transform_*()
methods.
Do you see all of the x and y kwargs other than field
and type
being off limits to shorthand?
I wouldn't say they're off-limits... I'd just say we need to think carefully about where to draw the line on what parts of Altair exactly mirror the Vega-Lite API and what parts diverge.
Just for background: the way the shorthand expressions work is:
to_dict()
method so that it detects the presence of this attribute, removes it, and interprets its contents into a form that is valid according to the schema (in this case, populating the field
, type
, aggregate
, and timeUnit
attributes).This customized code depends on the details of the schema, and so when the schema is updated the details of these modifications have to be updated as well. For example, the Vega-Lite version 1 and Vega-Lite version 2 schemas were so different that it required essentially rewriting the code from scratch, which all told took about 8 months to really get correct. Along the way, I dropped a number of other API shortcuts we had created earlier because I saw how unmaintainable they were when it came to schema updates.
I think overall it's good to have those encoding shorthands available at the top level of the encoding... it's something that's used in basically every chart, and so the added maintenance burden is worth it. For any other API changes that require circumventing the grammar of the Vega-Lite schema, I want to make sure we're carefully weighing the benefit to users vs the costs of the new maintenance burdens they create.
So no, nothing's off-limits per se, but there's a lot to keep in mind when making these kinds of decisions.
I see your point. Thanks for explaining it all for me.
Since the shorthand is so useful, I wonder if it's worth considering if Altair should develop some kind of modular framework within itself for the system.
Do you think it would be possible to abstract back the existing hassle of adding new shorthands to something more literate, extensible and maintainable?
Maybe... my best attempt at making it modular is here, in the code generation tools, where we automatically generate wrappers for schema objects for which we want to modify the default behavior: https://github.com/altair-viz/altair/blob/master/tools/generate_schema_wrapper.py#L245-L293
There's a lot in there that is "hard-coded", so when the schema changes it takes a bit of hunting to figure out why things aren't working any more.
Partly addressed in Altair 3, where the aggregate becomes optional.
I still think it may be useful to allow a shorter syntax, like sort='column'
rather than sort=alt.EncodingSortField('column')
Maybe... my best attempt at making it modular is here, in the code generation tools, where we automatically generate wrappers for schema objects for which we want to modify the default behavior: /tools/generate_schema_wrapper.py@master#L245-L293
There's a lot in there that is "hard-coded", so when the schema changes it takes a bit of hunting to figure out why things aren't working any more.
I think it's worth knowing what are the things that Altair still diverges from Vega-Lite, so we can revise our defaults, esp. for the upcoming VL4.
I still think it may be useful to allow a shorter syntax, like sort='column' rather than sort=alt.EncodingSortField('column')
Yep, I have an issue that you can upvote in VL here: https://github.com/vega/vega-lite/issues/4933.
It is now possible to do .sort(field='column')
, which is quite convenient so closing this issue.
import altair as alt
from vega_datasets import data
source = data.barley()[:5]
alt.Chart(source).mark_bar().transform_calculate(
).encode(
x='yield',
y=alt.Y('site').sort(field='yield')
)
Yesterday, a colleague asked me how to dictate the sort of bars in a chart. I developed this example to show him how.
It works great but, IMHO, the
SortField
requirement with three inputs, including a "fake"op
that in this case does not appear to be necessary, is asking a lot of beginners. And I'd like to think something more convenient could also benefit experts.I know nothing about the internals of this feature, but I'm curious if the
sort
channel could somehow benefit from a shorthand, much like thex
andy
channels.In my imagination, something like this:
Could be submitted like this, with the field and operation handled much like the other shorthand features, and the descending order of the sort handled with the same style as the order_by method of the popular Django framework:
I'm guessing you can easily imagine the other permutations in this kind of scheme. Additionally in cases where the dataframe is not grouped during encoding, it seems to me that providing the
op
argument should be, :drum:, optional. That would mean that if a field was to be used as the sort in ascending order with no aggregation, the shorthand submission could be as simple as:What do you think? If something like this already exists and I'm simply ignorant of it I will accept writing the documentation as my punishment.