iliatimofeev / gpdvega

gpdvega is a bridge between GeoPandas and Altair that allows to seamlessly chart geospatial data
https://iliatimofeev.github.io/gpdvega/
BSD 3-Clause "New" or "Revised" License
52 stars 5 forks source link

`max_rows=None` doesn't work #3

Closed afonit closed 6 years ago

afonit commented 6 years ago

Hello when I try out my own example, I get this error:

TypeError: Object of type 'Polygon' is not JSON serializable

I first tested the example on your site;

import altair as alt
import geopandas as gpd
import gpdvega

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# GeoDataFrame could be passed as usual pd.DataFrame
alt.Chart(world[world.continent!='Antarctica']).mark_geoshape(
).project(
).encode(
    color='pop_est', # shorthand infer types as for regular pd.DataFrame
    tooltip='id:Q' # GeoDataFrame.index is accessible as id
).properties(
    width=500,
    height=300
)

That worked great, and the plot showed up in jupyterlab.

I then tried my own example and got the above error.

alt.data_transformers.enable('default', max_rows=None)
ga = gpd.read_file('GA.json')
alt.Chart(ga).mark_geoshape(
).project(
).encode(
    color='STATE:O', # shorthand infer types as for regular pd.DataFrame
    tooltip='TRACT:Q' # GeoDataFrame.index is accessible as id
).properties(
    width=500,
    height=300
)

Since this did not work, and the error sayd type 'Polygon', I checked the dtypes on your example and on mine.

On yours:

world.dtypes

shows:

field dtype
pop_est float64
continent object
name object
iso_a3 object
gdp_md_est float64
geometry object

dtype: object

and on mine:

ga.dtypes
field dtype
GEO_ID object
STATE object
COUNTY object
TRACT object
BLKGRP object
NAME object
LSAD object
CENSUSAREA float64
geometry object

dtype: object

So my geometry is of type object just like in your example - any ideas on why/how the error is being produced and how to resolve it?

Note: If I do:

%matplotlib inline 
ga.plot()

Which uses matplot lib - the plot shows up fine.

The file ga.json is just a state shapefile downloaded from here (2010 Georgia): https://www.census.gov/geo/maps-data/data/cbf/cbf_blkgrp.html and converted go geojson from here: http://mapshaper.org/

iliatimofeev commented 6 years ago

I guess that problem is in alt.data_transformers.enable('default', max_rows= None)

https://iliatimofeev.github.io/gpdvega/user_guide/API.html#data-transformers Note: gpdvega use 'gpd_to_values' transformer or you can construct you own using https://iliatimofeev.github.io/gpdvega/user_guide/API.html#gpdvega.gpd_to_values

сб, 25 авг. 2018 г. в 19:47, afonit notifications@github.com:

Hello when I try out my own example, I get this error:

TypeError: Object of type 'Polygon' is not JSON serializable

I first tested the example on your site;

import altair as altimport geopandas as gpdimport gpdvega

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

GeoDataFrame could be passed as usual pd.DataFrame

alt.Chart(world[world.continent!='Antarctica']).mark_geoshape( ).project( ).encode( color='pop_est', # shorthand infer types as for regular pd.DataFrame tooltip='id:Q' # GeoDataFrame.index is accessible as id ).properties( width=500, height=300 )

That worked great, and the plot showed up in jupyterlab.

I then tried my own example and got the above error.

alt.data_transformers.enable('default', max_rows=None) ga = gpd.read_file('GA.json') alt.Chart(ga).mark_geoshape( ).project( ).encode( color='STATE:O', # shorthand infer types as for regular pd.DataFrame tooltip='TRACT:Q' # GeoDataFrame.index is accessible as id ).properties( width=500, height=300 )

Since this did not work, and the error sayd type 'Polygon', I checked the dtypes on your example and on mine.

On yours:

world.dtypes

shows: pop_est float64 continent object name object iso_a3 object gdp_md_est float64 geometry object dtype: object

and on mine:

ga.dtypes

GEO_ID object STATE object COUNTY object TRACT object BLKGRP object NAME object LSAD object CENSUSAREA float64 geometry object dtype: object

So my geometry is of type object just like in your example - any ideas on why/how the error is being produced and how to resolve it?

Note: If I do:

%matplotlib inline ga.plot()

Which uses matplot lib - the plot shows up fine.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iliatimofeev/gpdvega/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/ALBqbWKSo6JnaAU-cZ3XsBg_6SIMu2phks5uUX-IgaJpZM4WMdDs .

afonit commented 6 years ago

Ah - interesting. Based on your theory there, I went ahead and reloaded the notebooks and only tried plotting a subset: ga[ga['COUNTY'] == '051'] That worked fine without errors and the plot showed up.

Then I did the same subset plot put putting the maxrows put in:

alt.data_transformers.enable('default', max_rows=None)

And I then get the same error again: ``python TypeError: Object of type 'Polygon' is not JSON serializable



That seems to demonstrate that it is the maxrows line is causing the problem.

At the moment I can't imagine why - so I will post this question in the altair tracker and link to this issue.
afonit commented 6 years ago

@iliatimofeev , I posted this in altair: https://github.com/altair-viz/altair/issues/1119

Here is Jake's response:

Altair does not support geopandas data as inputs at this time, and it doesn't look like you've used gpdvega beyond simply importing it.

I'd suggest raising this issue in the gpdvega package.

I did not realize it before, but in the example from the readme on https://github.com/iliatimofeev/gpdvega is importing gpdvega, but then is not calling it anywhere as Jake pointed out. Is there a reason for that?

Just seeking to understand to help me troubleshoot this issue.

I have observed that without:

import gpdvega

I get the error:

TypeError: Object of type 'Polygon' is not JSON serializable

but when I only import gpdvega then it plots fine

but then when I use both

import gpdvega
alt.data_transformers.enable('default', max_rows=None)

I then get this error again:

TypeError: Object of type 'Polygon' is not JSON serializable
iliatimofeev commented 6 years ago

Use alt.data_transformers.enable('gpd_to_values', max_rows=None) instead 'default'

Altair uses Data transformers to preprocess data before storing. 'default' transformer is defined as pipeline pipe(data, limit_rows, to_values). max_rows is the second parameter of limit_rows transformation. But Altair's implementation of to_values can't process geometry column of your gpd.GeoDataFrame that way gpdvega implements is own transformation gpdvega.gpd_to_values which generates GeoJSON from gpd.GeoDataFrame.

Morethan gpdvega register 'gpd_to_values' transformation as alt.pipe(data, alt.limit_rows, gpd_to_values) that work as 'default' but with support of GeoDataFrame and enables it on module load.

When you call alt.data_transformers.enable('default', max_rows=None) you set back alt.to_values which doesn't support GeoDataFrame that why it crashes.

I hope it helps to clarify issue. Any suggestions on documentation improvement regarding this question is very welcome.

afonit commented 6 years ago

@iliatimofeev thanks for the explanation and educating me on the topic.

Right now when using:

alt.data_transformers.register(
    'gpd_to_values',
    lambda data: alt.pipe(data, alt.limit_rows, gpd_to_values)
)
alt.data_transformers.enable('gpd_to_values',  max_rows=None)

I get:

TypeError: <lambda>() got an unexpected keyword argument 'max_rows'

So I will keep playing around with it.

afonit commented 6 years ago
alt.data_transformers.register(
    'gpd_to_values',
    lambda data: alt.pipe(data, gpd_to_values)
)
alt.data_transformers.enable('gpd_to_values')

ok, I had to take out the alt.limit_rows argument to pipe, now it is working great.

Thanks again for all the help.

iliatimofeev commented 6 years ago

@afonit please install package from master pip install git+https://github.com/iliatimofeev/gpdvega.git Does it help to original issue with only (without new pipe registration )

alt.data_transformers.enable('gpd_to_values', max_rows=None)
afonit commented 6 years ago

@iliatimofeev , I did as you requested in your last post. That all works.

iliatimofeev commented 6 years ago

Cool. So I close it. Thank you for you help. PS: I will be great if you provide some more complex examples.

afonit commented 6 years ago

@iliatimofeev , when you say complex examples, are you thinking of things to show off the capability of the software, or more examples to test aspects of the functionality?

If it is for showing the functionality it might be interesting to build out an example jupyternotebook that people could download and run - if there is a place for that kind of contribution in the project.

iliatimofeev commented 6 years ago

I see a lack of good examples of data visualization on a map considering that vega-lite limited in geospatial functionality for now. In other hand, some kind of tutorials of gathering and processing data with GeoPandas and displaying it with Altair will be very useful to.

In fact I don't know for now how to integrate Jupiter Notebook into Sphinx documentation but I'm sure we can find a way to do it if have what to integrate :). At last it can be reproduced in rst file with a downloadable notebook as hyperlink.