holoviz / geoviews

Simple, concise geographical visualization in Python
http://geoviews.org
BSD 3-Clause "New" or "Revised" License
577 stars 75 forks source link

Bugs/issues when creating choropleth map from shapefile #62

Closed pkmn99 closed 7 years ago

pkmn99 commented 7 years ago

I have some bugs/issues when creating maps from shapefile. Here I use the UK referendum as an example to reproduce these issues.

(1) If there are missing values, the corresponding spatial unit on the map will not be displayed. it would be better to have an option to automatically fill the missing value to display all the spatial unit even though they don't . case2

(2)If the data type is integer. The map will not show color correctly. case1

(3) When the data value contains nan, the map looks good. But when these nan values are filled by 0. the resulting map is totally incorrect. (Here I manually assign three nan and fill them by zero) case3

(4) Latitude and longitude on the map also seem to be problematic.
case3

(5) Better to have way to set color range for these maps. I am affected mostly by the third issue, because in my study I have to fill some values, otherwise it will become the first issue.

Hope these issues can be fixed or improved.

Code to reproduce these issues is attached. I don't know how to upload jupyter notebook.

import pandas as pd
import numpy as np
import holoviews as hv
import geoviews as gv
import cartopy
from cartopy import crs as ccrs
%pylab

hv.notebook_extension('bokeh','matplotlib')

shapefile = 'boundaries/boundaries.shp'
shapes = cartopy.io.shapereader.Reader(shapefile)

referendum = pd.read_csv('referendum.csv')
referendum['value1'] = referendum['leaveVoteshare']

%%output backend='bokeh'
%%opts Shape (cmap='viridis') [xaxis=None yaxis=None tools=['hover'] width=400 height=500]

# Case 1: set referendum value is interger
referendum_modified = referendum.copy()
referendum_modified['value1'] = referendum_modified['value1'].apply(np.int)

data = hv.Dataset(referendum_modified)

gv.Shape.from_records(shapes.records(), data, on='code', value='value1',
                     index=['name', 'regionName'], crs=ccrs.PlateCarree())

%%output backend='bokeh'
%%opts Shape (cmap='viridis') [xaxis=None yaxis=None tools=['hover'] width=400 height=500]

# Case 2: If referendem values are missing for some counties
referendum_modified = referendum.copy()

referendum_modified = referendum_modified.iloc[0:100,:]

data = hv.Dataset(referendum_modified)

gv.Shape.from_records(shapes.records(), data, on='code', value='value1',
                     index=['name', 'regionName'], crs=ccrs.PlateCarree())

%%output backend='bokeh'
%%opts Shape (cmap='viridis') [xaxis=None yaxis=None tools=['hover'] width=400 height=500]

# Case 3: Here set the fist three rows to nan and fill them with 0   
referendum_modified = referendum.copy()
referendum_modified['value1'][0:3]=np.nan
referendum_modified['value1'] = referendum_modified['value1'].fillna(0)

data = hv.Dataset(referendum_modified)

gv.Shape.from_records(shapes.records(), data, on='code', value='value1',
                     index=['name', 'regionName'], crs=ccrs.PlateCarree())
philippjfr commented 7 years ago

Thanks for reporting this, I'm currently suspecting/hoping these issues have already been fixed. Could you please provide the output of:

print(gv.__version__)
print(hv.__version__)
pkmn99 commented 7 years ago

Here is the vesion info: hv : 1.6.2-1509-g1b94bfdc gv : 1.1.0

philippjfr commented 7 years ago

Okay so that's pretty recent. Thanks for the detailed bug report I'm looking into it now.

philippjfr commented 7 years ago

Btw to set color ranges simply use redim.

choropleth = gv.Shape.from_records(shapes.records(), data, on='code', value='value1',
                     index=['name', 'regionName'], crs=ccrs.PlateCarree())
choropleth.redim(value1=dict(range=(0, 100)))

If you pull the latest master on holoviews you can even do choropleth.redim.range(value1=(0, 100)).

philippjfr commented 7 years ago

Your issue (4) should be addressed when https://github.com/ioam/geoviews/issues/36 is implemented, which seems in sight.

pkmn99 commented 7 years ago

Thanks for the redim method.
I am not sure if issue (4) is entirely due to bokeh. Because if you directly display UK shapefile using bokeh, but without linking it to data value, you can get correct latitude/longitude label.

gv.Shape.from_records(shapes.records())

bokeh_plot

The label goes wrong when shapefile is linked to data value.

philippjfr commented 7 years ago

Because if you directly display UK shapefile using bokeh, but without linking it to data value, you can get correct latitude/longitude label.

That's not the data value but this bit crs=ccrs.PlateCarree()) which specifies the projection. If your data is already in lat/lons you don't really need to provide it. The coordinate reference system (crs) simply ensures the data can be projected to Mercator and therefore be overlaid onto a bokeh tilesource.

pkmn99 commented 7 years ago

I also would like to try the latest 1.20dev version to see if these issues have been resolved but unfortunately I can't get it installed correctly (conda install -c ioam geoviews=1.2.0dev1). So I have to stay with the 1.10 version.

philippjfr commented 7 years ago

I also would like to try the latest 1.20dev version to see if these issues haven been resolved but unfortunately I can't it installed using: conda install -c ioam geoviews=1.2.0dev1. So I have to stay with the 1.10 version.

I'll try to cut a 1.2.0dev2 release tomorrow.

philippjfr commented 7 years ago

Okay it appears some of it was already fixed but I'm shortly going to make a PR with some further fixes. One of the main issues here is that our format for storing this kind of data along with value dimensions is fairly inflexible and inefficient, there's plans on improving how we store data like this at which point this will become far easier (and faster). For now your condition (1) it has to add an additional Index dimension because with missing values it has to be able to index items uniquely.

Otherwise it all works now:

1. Integer values

screen shot 2017-04-19 at 12 07 58 am

2. Missing counties

screen shot 2017-04-19 at 12 08 06 am

3. NaN values

screen shot 2017-04-19 at 12 08 13 am
pkmn99 commented 7 years ago

Very glad to hear back from you with most of the issues fixed so quickly. You guys are awesome! I think issue 1 will not bother me too much as long as the nan issue has been fixed. Are you saying there will be a new version released tomorrow with these issues fixed? I am looking forward to it. Thanks!

philippjfr commented 7 years ago

You guys are awesome!

Glad you're enjoying the library.

Are you saying there will be a new version released tomorrow with these issues fixed?

A dev release yes, we're gearing up to a general holoviews 1.7.0 release this week and an official geoviews 1.2.0 release will follow shortly after.

philippjfr commented 7 years ago

Btw I just realized there is a 0.12.0dev1 and you could try this now: conda install -c ioam/label/dev geoviews=1.2.0dev1, it might include some of the color fixes, but won't include my changes in https://github.com/ioam/geoviews/pull/63 yet.

pkmn99 commented 7 years ago

The install command from you works! Now I can update geoviews. I will also try the next dev version. Thank you very much!

dtelliott79 commented 7 years ago

I had bookmarked this thread last week, being that I was having problems with some of the same bugs. However, now that I installed 0.12.0dev1, I'm getting an error:

AttributeError: 'GeoShapePlot' object has no attribute '_get_colormapper'

in code that was working before the update. How difficult would it be for me to uninstall the update? And more importantly, when might the next full releases fixing these bugs be out? Just so I known when to check back.

BTW: Thank you for the product. Without it, I don't think I could generate the choropleth with hover text that I need! So...awesome work!

philippjfr commented 7 years ago

How difficult would it be for me to uninstall the update? And more importantly, when might the next full releases fixing these bugs be out? Just so I known when to check back.

I suspect the issue is that you're still on HoloViews 1.6.2. We are releasing HoloViews 1.7 this very minute and that should fix your problems. Just wait another half an hour or so and then run:

conda install -c ioam holoviews

I'll also be releasing GeoViews 1.2.0 later this week with some further fixes.

philippjfr commented 7 years ago

Both HoloViews 1.7.0 and GeoViews 1.2.0 are now released, please update with:

conda install -c ioam -c conda-forge holoviews geoviews

I'll close this issue but please feel free to reopen it if you encounter further issues.

pkmn99 commented 7 years ago

Great job! I updated both holoviews and geoviews and most issues have been resolved. It seems there is a new issue with NaN value in the new version. In previous version, we can get correct map even the data contains NaN, but in the new version, NaN values cause errors. For example, when comment out fillna, the error occurs. referendum_modified['value1'] = referendum_modified['value1'].fillna(0)

pkmn99 commented 7 years ago

(1) It is good to have counties with missing values to show up on the map (previous issue 1) in the 1.20 version. Is there an option to turn off this automatic filling? I hope there could be different display modes. For example, "intersect": display the map using the intersection part of the shapefile and values provided (version before 1.20). "shapefile": display the map based on the shapefile, fill missing value with nan (default for version 1.20).

(2) Is there a way to set the extent of the map by lat/lon? For example, if we want the map to display southern part of UK instead of the entire UK. One way is to modify the shapefile to have a southern subset but I think there should be a better way than that. Another way is if the automatic filling can be turned off, the extent can be controlled by the value data provided (only link values for southern UK counties and missing counties will not be displayed unless you give them fill value. This is what I did before the 1.20 version).

The practical problem is that I am using a shape file for US counties containing Alaska and Hawaii which I don't want them to show up.

Thanks!

Update: I ended up creating a subset shapefile without Alaska and Hawaii for this particular problem.