vega / vegafusion

Serverside scaling for Vega and Altair visualizations
https://vegafusion.io
BSD 3-Clause "New" or "Revised" License
317 stars 18 forks source link

support geo-interface #250

Closed mattijn closed 1 year ago

mattijn commented 1 year ago

First of all, really great experience so far with vegafusion!

I tried the first example of this page https://altair-viz.github.io/user_guide/marks/geoshape.html using vegafusion, but I received an error. I can reproduce it with the following code snippet based on altair-tests available here: https://github.com/altair-viz/altair/blob/master/tests/vegalite/v5/tests/test_geo_interface.py

Given the following code snippet:

import altair as alt
import vegafusion

def geom_obj(geom):
    class Geom:
        pass

    geom_obj = Geom()
    setattr(geom_obj, "__geo_interface__", geom)
    return geom_obj

geom = {
    "coordinates": [[(0, 0), (0, 2), (2, 2), (2, 0), (0, 0)]],
    "type": "Polygon",
}
feat = geom_obj(geom)

chart = alt.Chart(feat).mark_geoshape()

with vegafusion.disable():
    with alt.data_transformers.enable(consolidate_datasets=False):
        chart_dict = chart.to_dict()
chart_dict
{'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}},
 'data': {'values': {'type': 'Feature',
   'geometry': {'coordinates': [[[0, 0], [0, 2], [2, 2], [2, 0], [0, 0]]],
    'type': 'Polygon'}}},
 'mark': {'type': 'geoshape'},
 '$schema': 'https://vega.github.io/schema/vega-lite/v5.6.1.json'}

But with vegafusion enabled it gives the following:

with vegafusion.enable():
    with alt.data_transformers.enable(consolidate_datasets=False):
        chart_data = vegafusion.transformed_data(chart)
chart_data
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[54], line 3
      1 with vegafusion.enable():
      2     with alt.data_transformers.enable(consolidate_datasets=False):
----> 3         chart_data = vegafusion.transformed_data(chart)
      4 chart_data

File [d:\Software\Miniconda3\envs\stable\lib\site-packages\vegafusion\evaluation.py:46](file:///D:/Software/Miniconda3/envs/stable/lib/site-packages/vegafusion/evaluation.py:46), in transformed_data(chart, row_limit)
     43 if dataset is None:
     44     raise ValueError("Failed to identify mark for Altair chart")
---> 46 (data,), warnings = runtime.pre_transform_datasets(
     47     vega_spec,
     48     [dataset],
     49     get_local_tz(),
     50     row_limit=row_limit,
     51     inline_datasets=inline_datasets
     52 )
     54 return data

File [d:\Software\Miniconda3\envs\stable\lib\site-packages\vegafusion\runtime.py:168](file:///D:/Software/Miniconda3/envs/stable/lib/site-packages/vegafusion/runtime.py:168), in VegaFusionRuntime.pre_transform_datasets(self, spec, datasets, local_tz, default_input_tz, row_limit, inline_datasets)
    165     raise ValueError("pre_transform_datasets not yet supported over gRPC")
    166 else:
    167     # Serialize inline datasets
--> 168     inline_dataset_bytes = self._serialize_inline_datasets(inline_datasets)
...
---> 31 if getattr(data.index, "name", None) is not None:
     32     data = data.reset_index()
     34 # Use pyarrow to infer schema from DataFrame

AttributeError: 'Geom' object has no attribute 'index'

I've implemented the support for the geo-interface in altair in here: https://github.com/altair-viz/altair/pull/1664, so I should probably be able to assist if you face issues or have questions regarding this geo-interface.

mattijn commented 1 year ago

For reference, there is also https://github.com/geoarrow/geoarrow, which potentially would make geometry support easier using a solely arrow approach.

jonmmease commented 1 year ago

Thanks for raising this issue @mattijn, I wasn't familiar with Altair's support for __geo_interface__.

It shouldn't take a lot of effort to avoid this crash by skipping the extraction of geo datasets to the server (allowing Altair to convert them to JSON as it normally does).

Longer term, do you see a benefit in having VegaFusion process geo datasets directly? Are there Vega-Lite transforms that operate on geo datasets that would be good candidates for server-side acceleration?

mattijn commented 1 year ago

Not crashing is fine for now. I did a talk on geo datasets with altair last week, the presentation is here: https://mattijn.github.io/talks/geopython2023.slides.html. I use geo datasets only for context and for selections, the aggregation accelerators happens on the other, compounded charts. This combination will be supported with the merge of #251?

jonmmease commented 1 year ago

This combination will be supported with the merge of https://github.com/hex-inc/vegafusion/pull/251?

Yes, it should be. I hope to get an RC out tomorrow. I'll ping you when that's available. Thanks for sharing your slides, it's really neat to see what you're doing with Altair!

jonmmease commented 1 year ago

@mattijn I just published 1.1.0rc1 to PyPI. If you have time, please give it a try and see if that address the issue for you. I'm tentatively planning to publish the final 1.1.0 early next week. Thanks!

mattijn commented 1 year ago

Hi Jon, I did the following test, which did not yet succeed:

import altair as alt
import geopandas as gpd
import vegafusion as vf
vf.enable()

gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")).query('iso_a3 == "LUX"')
c_enable = alt.Chart(gdf).mark_geoshape()
c_enable.to_dict()
{'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}},
 'data': {'url': 'vegafusion+dataset://table_cf9ac0d5_c800_4f66_b6fe_38202f75d920'},
 'mark': {'type': 'geoshape'},
 '$schema': 'https://vega.github.io/schema/vega-lite/v5.6.1.json'}

The data is still parsed as vegafusion dataset. Maybe because a geopandas dataframe is still of type pandas dataframe?

In Altair code base we therefor first check if it contains a __geo_interface__ attribute before doing checks on types See e.g. here https://github.com/altair-viz/altair/blob/master/altair/utils/data.py#L197-L204:

if hasattr(data, "__geo_interface__"):
    if isinstance(data, pd.DataFrame):
        data = sanitize_dataframe(data)
    data = sanitize_geo_interface(data.__geo_interface__)
    return json.dumps(data)
elif isinstance(data, pd.DataFrame):
    data = sanitize_dataframe(data)
    return data.to_json(orient="records", double_precision=15)
jonmmease commented 1 year ago

Yeah, I think you're exactly right about the issue being that the geopandas dataframe is also a Pandas dataframe. Thanks!

nicolaskruchten commented 1 year ago

That's what I do in PX as well. Are there non-GeoPandas implementations that use __geo_interface__?

mattijn commented 1 year ago

Yes, see a non-complete list here https://github.com/mlaloux/Python-geo_interface-applications

jonmmease commented 1 year ago

Ok, should be fixed in 1.1.0rc2! Let me know if you see any other issues with GeoPandas. Thanks again for the feedback!

mattijn commented 1 year ago

All fine now👍