Closed mazzma12 closed 3 years ago
Sounds like an interesting idea! Can you tell me more about the API you would envision and the resulting output?
Glad you like it!
Basically, I think px
could set rational defaults for different geom_types
of the GeoDataFrame
by using the API provided by geopandas.
Here is an example in pseudo-code just to showcase the API if you are not familiar with it
if isinstance(df, geopandas.GeoDataFrame):
gdf = df # I know I am geo
geom_type = gdf.geom_type.unique()
if all(geom_type) == 'Polygon':
# Treat as a polygon
elif all(geom_type) == 'Points':
# Set rational defaults
lon = gdf.geometry.x
lat = gdf.geometry.y # Mind the x, y
bbox = gdf.bbox # Might be useful for zoom
else:
NotImplementedError("Only Point and Polygon supported atm")
If I remember plotly uses geojson format for the API. In this case calling gdf.__geo_interface__
might be more advised instead of accessing the geometry
property (at least for polygons)
More about the "geometry" column ehre
OK cool, so what would you envision as a px
API here? px.scatter_geo(line="geometry_column")
or something like that? I'm not sure I see how geometry columns map onto the px
or Plotly primitives at the moment...
I will try to give more details about the API using an example from the gallery show :
px.scatter_mapbox(carshare, lat="centroid_lat", lon="centroid_lon", color="peak_hour", size="car_hours",
color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10)
I would expect that if the carshare is an instance of geopandas.GeoDataFrame
with Point
geometry types that the lat and lon column would be discovered by the method automatically so you just have to call :
px.set_mapbox_access_token(open(".mapbox_token").read())
px.scatter_mapbox(carshare, color="peak_hour", size="car_hours",
color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10)
The same thing could be done if your geometry is of type Polygon.
I see. I think that just grabbing a Point
column is a bit too automatic for my tastes... what about when there are two Point
columns? I could see a case for px.scatter_mapbox(point="point_col")
for sure though.
Actually, the point of geopandas
is to bring some structure to a DataFrame, with only one active geometry at a time. Hence it's rational to display this geometry by default, according to its geom_type.
There is no reason to have a geodataframe with one geometry active and wanted to display another one. If that occurs one can just use geopandas.set_geoemtry()
methods that acts just like pandas.set_index()
, or override the lon
, lat
kwargs in the plot method
Ah I see, thanks for that extra bit of context :)
Seems like a reasonable and reasonably small thing to add... Any obvious downsides?
As a sidenote: right now px
doesn't look at the index of a data frame at all, and I couldn't think of a good default behaviour for it in most cases. Any opinions?
Seems like a reasonable and reasonably small thing to add... Any obvious downsides?
I can't think of any
As a sidenote: right now
px
doesn't look at the index of a data frame at all, and I couldn't think of good default behaviour for it in most cases. Any opinions?
I encountered that problem several times in other circumstances. I haven't found a nice solution either, I usually ignore the index, and assume that one shall reset if one that uses it.
but I reckon it's boring to pass a column name and then realize it is an index. You could try to reset the index at the beginning (it's a copy anyway) but you'll have to deal with other problem such that potential duplicate in column name...
As a sidenote: right now
px
doesn't look at the index of a data frame at all, and I couldn't think of a good default behaviour for it in most cases. Any opinions?
I just met a use case where it might be useful: when you pass an instance of series to the scatter plot, you would like the default to assuming x is in index and y is the values. At the moment the only way I found to do it is a bit tedious :
OK, thanks for the input! Basically in certain cases (2d-cartesian plots) you would like the default value of x
to be the data frame index? This basically precludes the notion of having multiple data points at the same x
value, as index values must be unique, right? Also I don't think we can easily support multi-level indices just yet. (plotly.js supports 2-level axes for 2d cartesian plots but this isn't exposed in px
at the moment).
At this point I don't think we're going to support passing in Series
rather than DataFrame
s directly, as we need the column names all over the place for labelling.
Hi, I try not to derive too much on this as it is not related to this issue.
OK, thanks for the input! Basically in certain cases (2d-cartesian plots) you would like the default value of
x
to be the data frame index? yes This basically precludes the notion of having multiple data points at the samex
value, as index values must be unique, right? Not intuitive at first first glance, but indices are not necessarily unique in pandas Also I don't think we can easily support multi-level indices just yet. (plotly.js supports 2-level axes for 2d cartesian plots but this isn't exposed inpx
at the moment). Ofc I only intend a simple case with 1D index atm, just raise not implemented error instead At this point I don't think we're going to support passing inSeries
rather thanDataFrame
s directly, as we need the column names all over the place for labelling.
It's actually quite easy to grab the x
and y
column names from the series ts' by doing
x=ts.index.nameand
y=ts.name. Then when you call
ts.reset_index()it will return a new dataframe object with columns
[x, y]`
Happy to detail a bit longer in another post if needed :)
OK so re indexes there's another issue here now #37 where I outline a different approach :)
That would be great to have geopandas support to be able to plot shapely Polygons. imho, I even better solution would be to enable this function in Plotly first
I would love to see this, I am drawing a map a zip code overlay as well as individual colored data points in Folium currently.
Folium can't even support >1000 points without clusters and I would heavily prefer to use plotly express for my task due to its way better speed.
This is something I'm looking into in September! :)
(In the interim, check out the new choroplethmapbox
chart type: https://plot.ly/python/mapbox-county-choropleth/!)
We're wrapping up https://github.com/plotly/plotly.py/issues/1767 and then we'll tackle https://github.com/plotly/plotly.py/issues/1780 and add geopandas
support :)
Great ! For the implementation, you might want to take this into account for performance (with points geodataframe only) https://github.com/geopandas/geopandas/issues/964
As this is still marked as open I wanted to give it a bump - geopandas support via plotly express would be amazing!
Wanted to give this another bump, as geopandas support would be very helpful!
Adding another voice to this -- would be very helpful, and happy to help develop if someone can describe a high-level blueprint of what to do.
A quick update on this: we have pretty decent support (and no geopandas-specific documentation!) for displaying points and polygons with scatter_mapbox
and choropleth_mapbox
today, but the big gap is displaying line/multi-line data.
Adding Geopandas will be a great addition to the library. What is possible now with choropleth_mapbox now. I can figure out o plot polygons from Geopandas.
Adding Geopandas will be a great addition to the library. What is possible now with choropleth_mapbox now. I can figure out o plot polygons from Geopandas.
I would like to know the current status for plot polygons or point based on Geopandas dataframe. Also, I would like to know if there any contribution about linestring format from shapely to plotly.
Thanks :)!
I'm bumping this too ! It would be so helpful :)
Adding one more bump for what it is worth
Bumping too :)
Thanks for all the bumps :)
There is pretty decent support for GeoPandas right now, it's mostly a question of adding some examples to the docs really. If you have a geo data frame with point data you can use scattergeo
or scattermapbox
and manually set latitude and longitude. If you have a geo data frame with polygons, you can use the geojson
argument to choropleth
or choroplethmapbox
.
The one place we don't have good support is if you have a geo data frame with line data.. this one will require more thought.
Thanks @nicolaskruchten
I'm having trouble to build the geojson
argument to use it in choropleth
from a Geopandas dataframe. Try my best to fit with this example (from Plotly doc)
import plotly.express as px
df = px.data.election()
geojson = px.data.election_geojson()
fig = px.choropleth(df, geojson=geojson, color="Bergeron",
locations="district", featureidkey="properties.district",
projection="mercator"
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
But for this toy example, geojson is already built ;)
if gdf
is a GeoPandas GeoDataFrame you should be able to just pass geojson=gdf.geometry
I believe.
This is why I'm saying it's mostly a documentation issue :)
I've just made a toy exemple (maybe it could help for documentation) :
import geopandas as gpd
import plotly.express as px
# GeoJson from French Open-Data (french department)
url = "https://www.data.gouv.fr/fr/datasets/r/90b9341a-e1f7-4d75-a73c-bbc010c7feeb"
# Read file with geopandas
geo_df = gpd.read_file(url)
geo_df.head()
# Now using choropleth
fig = px.choropleth_mapbox(geo_df,
geojson=geo_df.geometry,
locations="nom",
center={"lat": 48.8534, "lon": 2.3488},
zoom=4)
fig.show()
No polygon display
Yes, you'll probably need to map color
to some data to get things to show up :)
I adding some code :
# To have a random value to use it to color
geo_df['random_color'] = np.random.randint(1, 6, geo_df.shape[0])
fig = px.choropleth_mapbox(geo_df,
geojson=geo_df.geometry,
locations="nom",
center={"lat": 48.8534, "lon": 2.3488},
color="random_color",
mapbox_style="carto-positron",
zoom=4)
fig.show()
Result : same as previous but I have a beautiful colormap in legend ;)
@armgilles Your code doesn't work because your geojson=geo_df.geometry
is not a geojson file. Choroplethmapbox accepts only a geojson file defined as a dict with the following structure:
geojson = {"type": "FeatureCollection",
"features": []
}
That's why you have to convert the geo_df
to a geojson file.
Here is a working code:
import geopandas as gpd
import pandas as pd
import numpy as np
import plotly.express as px
import json
url = "https://www.data.gouv.fr/fr/datasets/r/90b9341a-e1f7-4d75-a73c-bbc010c7feeb"
geo_df = gpd.read_file(url)
#geo_df.head()
#convert the geo-dataframe to geojson
my_geojson = json.loads(geo_df.to_json())
#define a dataframe with data for choroplethmapbox
df = pd.DataFrame(dict(code=list(geo_df['code']),
))
np.random.seed(123)
df['vals'] = np.random.randint(1, 8, geo_df.shape[0])
#df.head()
fig = px.choropleth_mapbox(df,
geojson=my_geojson,
featureidkey='properties.code',
locations="code",
center={"lat": 47.35, "lon": 2.3},
color="vals",
mapbox_style="carto-positron",
zoom=4)
It is isn't recommended to pass geo_df
to px.choropleth_mapbox
, instead of newly defined dataframe, df
, because geo_df
is a bigger file, containing the geometry of all polygons, which is already passed via geojson=my_geojson
.
I did actually add special handling in PX for the .geometry case where it extracts the geojson internally... not sure why it's not working in this specific case!
OK so I just needed to peek under the hood a bit...
passing geojson=geo_df.geometry
does work, but locations
must be set to geo_df.index
in this case.
It is isn't recommended to pass geo_df to px.choropleth_mapbox, instead of newly defined dataframe, df, because geo_df is a bigger file, containing the geometry of all polygons, which is already passed via geojson=my_geojson.
I'll respectfully disagree here... the size of geo_df
doesn't matter, and it is recommended to set data_frame=geo_df
for GeoPandas dataframes: PX only extracts the columns it needs (so the number of columns doesn't matter), and is able to extract the geojson
from geo_df.geometry
as it specifically looks for the __geo_interface__
attribute, here https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_chart_types.py#L1147
In any case: none of this is documented yet under plotly.com/python which is why this issue remains open :)
Here's a complete/simple example (edited to remove the unnecessary .__geo_interface__
I'd left in for testing :)
import numpy as np
import geopandas as gpd
import plotly.express as px
# GeoJson from French Open-Data (french department)
url = "https://www.data.gouv.fr/fr/datasets/r/90b9341a-e1f7-4d75-a73c-bbc010c7feeb"
# Read file with geopandas
geo_df = gpd.read_file(url)
geo_df['random_color'] = np.random.randint(1, 6, geo_df.shape[0])
fig = px.choropleth_mapbox(geo_df,
geojson=geo_df.geometry,
locations=geo_df.index,
color='random_color',
center={"lat": 48.8534, "lon": 2.3488},
mapbox_style="open-street-map",
zoom=4)
fig.show()
Thanks @nicolaskruchten & @empet for your example 🥇
Using geopandas
DataFrame and his geometry
as geojson
argument is pretty cleaver. Didn't understand location
argument previously, now it's good.
Little remark, with the previous code, trying locations=geo_df.code
will display figure by with some holes :
Don't understand why (maybe string type ?)
The locations
key serves to identify which polygon in geojson
the values from color
should match to. In the base case where the values in color
come from the same data frame as the polygons, using geo_df.index
is the only thing that makes sense basically. If you set it to some other sequence of numbers you'll get a map but the colors won't match the polygons. If you set it to a string column, in some cases the numeric-string-to-number comparison will work like it did above and you'll get an odd result. In this case it looks like 01
through 09
didn't get matched and everything else did.
I've added a GeoPandas example similar to the one above to each of the following pages btw:
I chose a different dataset because, confusingly, the one you're using is actually a GeoJSON object already... you're loading it via GeoPandas but you could also have just loaded it as a dict ;) The examples I've used above for choropleths load data from a shapefile, which i hope is less likely to be confusing for users.
thank you for the explanations !
You did well to chose a different dataset. I hope it helps communities :)
While I was in there, I added some GeoPandas examples to:
It's not as graceful as the polygon/point support but at least it's in the docs now :)
I'll close this issue in favour of more specific proposals in the Plotly.py repo such as https://github.com/plotly/plotly.py/issues/2601
Hi,
Thanks for your amazing work, many custom function can now been deprecated and lots of keystroke are saved. If I may have a feature request it would be to support geopandas API for the geo plots.
If you are not familiar with this library, it inherits
pd.DataFrame
and embedd a customgeometry
column that stores the geo object (Points, Polygone, Line ...). It would be great if the plots could be done based on the geometry automatically, without casting points, or specifing the that you want polygons...Tell me if you want more details about this.