vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.26k stars 793 forks source link

Can't overlay a calculated chart and mark_rule #3423

Open ale-dg opened 4 months ago

ale-dg commented 4 months ago

What happened?

Hi all,

I have been trying to overly a histogram based on standard deviations using the transform_joinaggregate and transform_calculate functions within Altair, and then create a ruler. But when overlaying the graphs it throws this error:

ValueError: DataFusion error: Schema error: No field named "Rating_std". Valid fields are _vf_order, "Title", "US_Gross", "Worldwide_Gross", "US_DVD_Sales", "Production_Budget", "Release_Date", "MPAA_Rating", "Running_Time_min", "Distributor", "Source", "Major_Genre", "Creative_Type", "Director", "Rotten_Tomatoes_Rating", "IMDB_Rating", "IMDB_Votes".

Is there a workaround it? Naturally I can change the DataFrame and just call the variable, although if I could just do it in one shot in the library, it would be awesome.

Below the using the movies dataset.

Thanks!

Best

from vega_datasets import data

movies = data.movies()
movies.head()

rating = (
    alt.Chart(movies, title="Movies Rating Histogram - SD")
    .transform_joinaggregate(mean_val="mean(IMDB_Rating)", std_val="stdev(IMDB_Rating)")
    .transform_calculate(
        Rating_std="(datum.IMDB_Rating - datum.mean_val) / datum.std_val"
    )
    .mark_bar(color="green")
    .encode(
        alt.X("Rating_std", type="quantitative", title="Rating", bin=True).bin(
            maxbins=40
        ),
        alt.Y("count()"),
    )
    .properties(width=400)
)

rule = (
    alt.Chart(df)
    .mark_rule(color='red')
    .encode(
        x=alt.datum(0),
    )
)

alt.layer(rating, rule)

What would you like to happen instead?

A nice overlay of both graphs!

Which version of Altair are you using?

5.3.0

jonmmease commented 4 months ago

Hi @ale-dg, thanks for the report.

It looks like this error is coming from VegaFusion. Could you try again without VegaFusion enabled? You can run this to turn if off for testing.

import altair as alt
alt.data_transformers.enable("default")

This will help us narrow down where the issue is cropping up. Also, could you double check your example to make sure all of the imports and variables are defined? For example, I don't see a definition for df here.

Thanks!

ale-dg commented 4 months ago

Hi, @jonmmease

Thank you for the quick answer. I have made the changes and indeed removing the VegaFusion function removes the error. I have also enabled the option above and now it only shows the ruler (see screenshot below). See the code below..

Thanks

Best

from vega_datasets import data
import pandas as pd
import numpy as np
import altair as alt
alt.data_transformers.enable("default")

movies = data.movies()
movies.head()

rating = (
    alt.Chart(movies, title="Movies Rating Histogram - SD")
    .transform_joinaggregate(mean_val="mean(IMDB_Rating)", std_val="stdev(IMDB_Rating)")
    .transform_calculate(
        Rating_std="(datum.IMDB_Rating - datum.mean_val) / datum.std_val"
    )
    .mark_bar(color="green")
    .encode(
        alt.X("Rating_std", type="quantitative", title="Rating", bin=True).bin(
            maxbins=40
        ),
        alt.Y("count()"),
    )
    .properties(width=400)
)

rule = (
    alt.Chart(movies)
    .mark_rule(color="red")
    .encode(
        x=alt.datum(0),
        size=alt.value(3)
    )
)

rating + rule

ruler_binned

jonmmease commented 4 months ago

Thanks @ale-dg, that's helpful. There seems to be a Vega-Lite bug when referencing columns created by transform_calculate from an encoding channel with binning enabled, but only when the chart is present in a layer (which is why rating displays fine on its own).

I'll file a Vega-Lite issue soon, but here is a workaround that replaces the use of the encoding-level bin transform, with use of the explicit transform_bin transform.

from vega_datasets import data
import pandas as pd
import numpy as np
import altair as alt
alt.data_transformers.enable("default")

movies = data.movies()
movies.head()

rating = (
    alt.Chart(movies, title="Movies Rating Histogram - SD")
    .transform_joinaggregate(mean_val="mean(IMDB_Rating)", std_val="stdev(IMDB_Rating)")
    .transform_calculate(
        Rating_std="(datum.IMDB_Rating - datum.mean_val) / datum.std_val"
    )
    .transform_bin(field="Rating_std", as_=["Rating_std_bin_start", "Rating_std_bin_end"], bin=alt.Bin(maxbins=40))
    .mark_bar(color="green", x2Offset=-1)
    .encode(
        alt.X("Rating_std_bin_start", type="quantitative", title="Rating"),
        alt.X2("Rating_std_bin_end", title="Rating"),
        alt.Y("count()"),
    )
    .properties(width=400)
)

rule = (
    alt.Chart(movies)
    .mark_rule(color="red")
    .encode(
        x=alt.datum(0),
        size=alt.value(3)
    )
)

rating + rule

visualization

Notice that binning is not longer specified as part of the alt.X encoding. By doing this, we lose Vega-Altair's automatic bin-width calculation logic, which is why I needed to add the alt.X2 encoding as well.

Hope that helps in the meantime!

ale-dg commented 4 months ago

Thanks @jonmmease. I'll use this workaround in the meantime. And thank you for the help with logging the issue.

Best

jonmmease commented 4 months ago

Reported in https://github.com/vega/vega-lite/issues/9354. Thanks again for taking the time to report this @ale-dg, that's a big help!