Closed afonit closed 6 years ago
Thank you for your contribution. The max rows case is not covered by tests for now, I'll check what I can do. But I'd rather prefer to understand why 'pipe' not works as expected and fix it than exclude functionality.
Sure thing, thanks for taking a look at the request.
Here is a small reproducible example based on the readme.
I am just adding in some more points to push it over the level.
In this example you will still get the max rows error:
import altair as alt
import geopandas as gpd
import gpdvega
import pandas as pd
from shapely.geometry import Point
from gpdvega import gpd_to_values
alt.data_transformers.register(
'gpd_to_values',
lambda data: alt.pipe(data, alt.limit_rows, gpd_to_values)
)
alt.data_transformers.enable('gpd_to_values')
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# GeoDataFrame could be passed as usual pd.DataFrame
chart_one = alt.Chart(world[world.continent!='Antarctica']).mark_geoshape(
).project(
).encode(
color='pop_est', # shorthand infer types as for regular pd.DataFrame
tooltip='id:Q' # GeoDataFrame.index is accessible as id
).properties(
width=500,
height=300
)
# generate some points to push us over the max rows
some = [[-70.05179, 25.10815] for x in range(6000)]
some = pd.DataFrame(some, columns=['x', 'y'])
some['Coordinates'] = list(zip(some.x, some.y))
some['Coordinates'] = some['Coordinates'].apply(Point)
gdfo = gpd.GeoDataFrame(some, geometry='Coordinates')
chart_two = alt.Chart(gdfo).mark_point(color='red').encode(#.mark_point(size=550, color='orange').encode(
longitude='x:Q',
latitude='y:Q'
)
chart_one + chart_two
But then if we change this line:
lambda data: alt.pipe(data, alt.limit_rows, gpd_to_values)
to:
lambda data: alt.pipe(data, gpd_to_values)
We then get the plot from the below code:
import altair as alt
import geopandas as gpd
import gpdvega
import pandas as pd
from shapely.geometry import Point
from gpdvega import gpd_to_values
alt.data_transformers.register(
'gpd_to_values',
lambda data: alt.pipe(data, alt.limit_rows, gpd_to_values)
)
alt.data_transformers.enable('gpd_to_values')
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# GeoDataFrame could be passed as usual pd.DataFrame
chart_one = alt.Chart(world[world.continent!='Antarctica']).mark_geoshape(
).project(
).encode(
color='pop_est', # shorthand infer types as for regular pd.DataFrame
tooltip='id:Q' # GeoDataFrame.index is accessible as id
).properties(
width=500,
height=300
)
# generate some points to push us over the max rows
some = [[-70.05179, 25.10815] for x in range(6000)]
some = pd.DataFrame(some, columns=['x', 'y'])
some['Coordinates'] = list(zip(some.x, some.y))
some['Coordinates'] = some['Coordinates'].apply(Point)
gdfo = gpd.GeoDataFrame(some, geometry='Coordinates')
chart_two = alt.Chart(gdfo).mark_point(color='red').encode(#.mark_point(size=550, color='orange').encode(
longitude='x:Q',
latitude='y:Q'
)
chart_one + chart_two
@iliatimofeev , ok, after reading through this, and the altair codebase, I think I now understand.
The limit_rows is expecting a max_rows argument.
So this works:
lambda data: alt.pipe(data, alt.limit_rows(max_rows=100000), gpd_to_values)
or in my case since I did not want to limit any rows this works also:
lambda data: alt.pipe(data, gpd_to_values)
but this line as it currently is in the geodata.py file will still cause the max_rows error:
lambda data: alt.pipe(data, alt.limit_rows, gpd_to_values)
So is it safe to say that gpdvega geodata.py file should either have the parameter of max_rows populated, or it should leave the alt.limit_rows out in the current file.
I would love to modify my pull request depending on what you would like to have happen.
Alright, so this looks like it was a misunderstanding on my part - based on some earlier errors I was getting that I had posted in another issue. I will think through this a bit more and see if there is a clarification I can make in the documentation, or if this is just a perception issue I had.
@afonit the are is a bug in gpdvega
it expected to work as Altair do:
alt.data_transformers.enable('gpd_to_values',max_rows=None)
Transformer should be registered slightly different to works as expected. It's my mistake thank you for finding it.
from toolz.curried import curry, pipe
@curry
def gpd_to_values_data_transformer(data, max_rows=5000):
return pipe(data, alt.limit_rows(max_rows=max_rows), gpd_to_values)
alt.data_transformers.register(
'gpd_to_values',
gpd_to_values_data_transformer
)
The maxrows error will still come with this current configuration. I had to take out alt.limit_rows. After doing that, I can plot large geographic plots.