Closed mattijn closed 1 year ago
For reference, there is also https://github.com/geoarrow/geoarrow, which potentially would make geometry support easier using a solely arrow approach.
Thanks for raising this issue @mattijn, I wasn't familiar with Altair's support for __geo_interface__
.
It shouldn't take a lot of effort to avoid this crash by skipping the extraction of geo datasets to the server (allowing Altair to convert them to JSON as it normally does).
Longer term, do you see a benefit in having VegaFusion process geo datasets directly? Are there Vega-Lite transforms that operate on geo datasets that would be good candidates for server-side acceleration?
Not crashing is fine for now. I did a talk on geo datasets with altair last week, the presentation is here: https://mattijn.github.io/talks/geopython2023.slides.html. I use geo datasets only for context and for selections, the aggregation accelerators happens on the other, compounded charts. This combination will be supported with the merge of #251?
This combination will be supported with the merge of https://github.com/hex-inc/vegafusion/pull/251?
Yes, it should be. I hope to get an RC out tomorrow. I'll ping you when that's available. Thanks for sharing your slides, it's really neat to see what you're doing with Altair!
@mattijn I just published 1.1.0rc1 to PyPI. If you have time, please give it a try and see if that address the issue for you. I'm tentatively planning to publish the final 1.1.0 early next week. Thanks!
Hi Jon, I did the following test, which did not yet succeed:
import altair as alt
import geopandas as gpd
import vegafusion as vf
vf.enable()
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")).query('iso_a3 == "LUX"')
c_enable = alt.Chart(gdf).mark_geoshape()
c_enable.to_dict()
{'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}},
'data': {'url': 'vegafusion+dataset://table_cf9ac0d5_c800_4f66_b6fe_38202f75d920'},
'mark': {'type': 'geoshape'},
'$schema': 'https://vega.github.io/schema/vega-lite/v5.6.1.json'}
The data is still parsed as vegafusion dataset. Maybe because a geopandas dataframe is still of type pandas dataframe?
In Altair code base we therefor first check if it contains a __geo_interface__
attribute before doing checks on types
See e.g. here https://github.com/altair-viz/altair/blob/master/altair/utils/data.py#L197-L204:
if hasattr(data, "__geo_interface__"):
if isinstance(data, pd.DataFrame):
data = sanitize_dataframe(data)
data = sanitize_geo_interface(data.__geo_interface__)
return json.dumps(data)
elif isinstance(data, pd.DataFrame):
data = sanitize_dataframe(data)
return data.to_json(orient="records", double_precision=15)
Yeah, I think you're exactly right about the issue being that the geopandas dataframe is also a Pandas dataframe. Thanks!
That's what I do in PX as well. Are there non-GeoPandas implementations that use __geo_interface__
?
Yes, see a non-complete list here https://github.com/mlaloux/Python-geo_interface-applications
Ok, should be fixed in 1.1.0rc2! Let me know if you see any other issues with GeoPandas. Thanks again for the feedback!
All fine now👍
First of all, really great experience so far with vegafusion!
I tried the first example of this page https://altair-viz.github.io/user_guide/marks/geoshape.html using vegafusion, but I received an error. I can reproduce it with the following code snippet based on altair-tests available here: https://github.com/altair-viz/altair/blob/master/tests/vegalite/v5/tests/test_geo_interface.py
Given the following code snippet:
But with vegafusion enabled it gives the following:
I've implemented the support for the geo-interface in altair in here: https://github.com/altair-viz/altair/pull/1664, so I should probably be able to assist if you face issues or have questions regarding this geo-interface.