pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.57k stars 1.98k forks source link

Altair incompatibility #18639

Open fonnesbeck opened 2 months ago

fonnesbeck commented 2 months ago

Checks

Reproducible example

import polars as pl
import numpy as np

# Simulated DataFrame
np.random.seed(0)
data = {
    "shotRebound": np.random.randint(0, 2, size=1000)  # Simulating shot rebounds (0 or 1)
}
simulated_df = pl.DataFrame(data)

# Plotting the histogram
simulated_df.select(pl.col("shotRebound")).plot.hist(bins=30, title="Histogram of Shot Rebounds", xlabel="Shot Rebound", ylabel="Frequency")

Log output

AttributeError                            Traceback (most recent call last)
Cell In[26], line 12
      9 simulated_df = pl.DataFrame(data)
     11 # Plotting the histogram
---> 12 simulated_df.select(pl.col("shotRebound")).plot.hist(bins=30, title="Histogram of Shot Rebounds", xlabel="Shot Rebound", ylabel="Frequency")

File ~/repos/xg_model/.pixi/envs/default/lib/python3.12/site-packages/polars/dataframe/plotting.py:255, in DataFramePlot.__getattr__(self, attr)
    253 if method is None:
    254     msg = "Altair has no method 'mark_{attr}'"
--> 255     raise AttributeError(msg)
    256 return lambda **kwargs: method().encode(**kwargs).interactive()

AttributeError: Altair has no method 'mark_{attr}'

Issue description

Despite up-to-date versions of polars and altair being installed, plotting a histogram fails, apparently due to incompatibility between the two libraries.

Expected behavior

Plotted histogram

Installed versions

``` --------Version info--------- Polars: 1.6.0 Index type: UInt32 Platform: Linux-6.9.12-200.fc40.x86_64-x86_64-with-glibc2.39 Python: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0] ----Optional dependencies---- adbc_driver_manager altair 5.4.1 cloudpickle 3.0.0 connectorx deltalake fastexcel fsspec gevent great_tables matplotlib 3.9.2 nest_asyncio 1.6.0 numpy 1.26.4 openpyxl pandas 2.2.2 pyarrow 17.0.0 pydantic pyiceberg sqlalchemy torch xlsx2csv xlsxwriter ```
MarcoGorelli commented 2 months ago

hey @fonnesbeck

the plotting methods in Polars are shorthand for some Altair ones, e.g.:

It's not meant to be perfectly backwards-compatible with the hvplot API

If you want to keep your plot working as before, you can add import hvplot.polars to the top of your script, and then do simulated_df.select(pl.col("shotRebound")).plot.hist(bins=30, title="Histogram of Shot Rebounds", xlabel="Shot Rebound", ylabel="Frequency")

will post an Altair solution soon

MarcoGorelli commented 2 months ago

Altair solution (requires an explicit import altair as alt - though I'd suggest trying df['shotRebound'].hist() in case the default is good enough)

import altair as alt

simulated_df.plot.bar(
    x=alt.X("shotRebound", bin=alt.Bin(maxbins=30), title="Shot Rebound"),
    y=alt.Y("count()", title="Frequency"),
).properties(title="Histogram of Shot Rebounds", width=600)

this may be common enough that max_bins could be added to Series.hist 🤔

fonnesbeck commented 4 days ago

Yikes, that's pretty verbose for a simple histogram!

MarcoGorelli commented 3 days ago

yup - some substantial improvements are on their way, just give it another week or so