vega / vegafusion

Serverside scaling for Vega and Altair visualizations
https://vegafusion.io
BSD 3-Clause "New" or "Revised" License
303 stars 15 forks source link

VegaFusion treats pandas categoricals differently than Altair's default transformer #402

Open joelostblom opened 9 months ago

joelostblom commented 9 months ago

The chart from the spec below renders differently with vegafusion enabled. vegafusion handles it better than the the default transformer and creates a more reasonable x-scale (the same result can be obtained in altair by casting to a string instead of a category as in https://github.com/altair-viz/altair/discussions/3140#discussioncomment-6714090). If we can have altair render the chart the same without vegafusion that would be great, but depending on what vegafusion actually does here that might not be possible.

import altair as alt
from vega_datasets import data

source = data.wheat()
source.year = source.year.astype('category')

chart = alt.Chart(source, height=200).mark_point().encode(
    x='year:T',
    y='wheat',
)

image

joelostblom commented 9 months ago

Oh I noticed that Vegafusion displays the desirable behavior with integers as well! That's great and resolves the issue I had when openeing https://github.com/altair-viz/altair/discussions/3140 in the first place. Now I really want use to bring that behavior into altair's base transformer too if possible...

jonmmease commented 9 months ago

VegaFusion doesn't actually support categoricals internally, and "expands" them during the conversion to arrow, so it makes sense that you see the same behavior as with integers in this case.

https://github.com/hex-inc/vegafusion/blob/d94fee469524879d2d29cb9320a31ecacf7a25dc/python/vegafusion/vegafusion/transformer.py#L52-L55

For integers parsed as temporal columns, VegaFusion currently interprets them as years:

https://github.com/hex-inc/vegafusion/blob/d94fee469524879d2d29cb9320a31ecacf7a25dc/vegafusion-runtime/src/data/tasks.rs#L337-L354

I honestly don't recall why I put that logic in there. I thought it was to match an Altair or Vega-Lite example, but maybe not. I think this could be a change to propose in Vega's date parsing. I'm really not sure what it's currently doing, I would have guessed that the alternative to treating integers as years would be to treat them as UTC milliseconds, but that doesn't appear to be happening either given the .800 x-axis tick label.

As an alternative, Altair could re-interpret integer columns as years by adding a custom calculate expression. Though that might be more variation from Vega/Vega-Lite than we want. Vega-Lite doesn't have access to column type info, so the special treatment wouldn't be able to happen there.

joelostblom commented 9 months ago

I see, I do find that the Vegafusion behavior adds a lot of convenience here, so I would be in favor of re-implementing that solution in Altair as well, although I would prefer if it could be added in Vega so that we don't have to depart from VL (although I think this is a smaller departure). I found a Vega issue where this was initially discussed and suggested the change there again https://github.com/vega/vega/issues/1681. There is also an issue in Altair from the same time https://github.com/altair-viz/altair/issues/1365