vega / vegafusion

Serverside scaling for Vega and Altair visualizations
https://vegafusion.io
BSD 3-Clause "New" or "Revised" License
331 stars 18 forks source link

adding a regression line causes vf to not cull unused data, also rendering looks pixelated #486

Open jowens opened 6 months ago

jowens commented 6 months ago

I've been using vf to cull big dataframes of unecessary data when saving Altair plots to HTML. This works splendidly.

I am comparing two plots of the same data. One is just a scatter plot. The second is the same scatter plot with a regression line added on top of it.

Altair code looks like:

    lchart = alt.layer(chart, chart.transform_regression(df[x],df[y]).mark_line()

Anyway, I note:

  1. The original (non-regression-line) HTML is 342 kb but when adding the regression line, it's 1.2 MB. It now includes a lot of data values that don't have a valid y value. Consequently the range is now larger (because earlier dates on the x axis are now in the dataset even though they don't have y values).
  2. The plotted datapoints are now rasterized, which I wasn't expecting.

Screenshots of the rendered HTML below.

Don't know if either of these are from vf's influence. But neither was expected.

"Make a much much smaller example" is a perfectly cromulent response. :)

Processing_Power_over_Time_html Processing_Power_over_Time_html
jonmmease commented 6 months ago

Thanks for the report @jowens, glad to hear VegaFusion was been working well for you overall!

VegaFusion doesn't support Vega's regression transform yet. See https://github.com/vega/vegafusion/issues/401 for some notes on that.

I don't quite follow what you mean by there being a difference in the points being rasterized. This should be controlled by the embed_options used when saving to html. See https://altair-viz.github.io/user_guide/saving_charts.html#html-format. Could you elaborate more?

jowens commented 6 months ago

OK, no regression transform support, got it. Good luck on further development!

In terms of "rasterized": The sharpness/quality of the points when I don't have the regression line:

https://owensgroup.github.io/gpustats/plots/Processing%20Power%20over%20Time.html

just looks way better than when I do:

https://owensgroup.github.io/gpustats/plots/Processing%20Power%20over%20Time_regressionline.html

but my guess is this has nothing to do with vf.

jowens commented 6 months ago

(Also it would be pretty cool if, when vf doesn't support a particular feature, it would print some sort of warning, even if that warning is behind a flag. Thanks for considering this.)