Closed clhenrick closed 8 years ago
Good catch. I probably just need to pass the correct encoding to ogr2ogr
. I figured I'd try using ogr since we're already using it elsewhere in the Makefile.
Ah that makes sense. I seem to always run into issues importing shp data into pg with ogr2ogr so now prefer to use shp2pgsql as it seems more reliable imo.
Can you confirm that this was working previously? I tried to switch to shp2pgsql and I'm getting the same errors.
I wonder if it's a corruption in the Natural Earth files? We had a weird error in the past where there was a strange encoding bug that we couldn't fix. We ended up having to manually edit the .dbf file to get it to have the correct encoding. See here, in our Stamen fork of Natural Earth: https://github.com/stamen/natural-earth-vector/commit/737ce368668f207ce23f30667cead69384d89b5d
There's also some related info here: https://github.com/CartoDB/cartodb/issues/1143
Huh, I definitely don't remember seeing it earlier when importing via shp2pgsql. Did you use the encoding flag when you imported it that way?
I saved a modified version of ne_10m_admin_1_states_provinces_scale_rank
in our Stamen fork of Natural Earth: https://github.com/stamen/natural-earth-vector/pull/2
I didn't actually modify it at all, I merely opened the file in QGIS and saved it again.
Now with e8752ffc09543b04aeeaa62aa1ba5d4f8c3ef0f3 I can import the file just fine, and the encoding just works.
:+1:
Oops, spoke too soon: LATIN1 characters are working, but other characters are not:
And they're broken in the database, not just at the rendering step, so it's still an importing problem:
select name from ne_10m_admin_1_states_provinces_labels where name ILIKE '%ninh%';
name
------------
B?c Ninh
Ninh Bình
Ninh Thu?n
Qu?ng Ninh
Tây Ninh
huh, when you imported the natural earth data with shp2pgsql did you try passing that weird windows encoding with the -W
flag? I think that's what I did when I imported the data and don't remember having this problem, but could be wrong.
Yup, I did use that flag. Actually, these characters are messed up even when I open them in QGIS... which the LATIN1 characters never were.
Very strange. Looking at the Natural Earth Github repo I see that the ne_10m_admin_1_states_provinces
are at version 3.0 while the admin 0 data looks like it's at v2.0 -- perhaps the encoding changed to utf-8
with the latest data updates?
Happens with Polish, too:
@almccon I forget did we ever check with Nathaniel about this? I know Stamen has their own port of Natural Earth but can't remember why that is.
Stamen has it's own fork so we can make changes and have them available for our map styles. When I make a fix that I'm sure is reliably, I issue them as pull requests: https://github.com/nvkelso/natural-earth-vector/pulls
I know that the fundamental Natural Earth sources are in a Geodatabase, so everything in github is really just derived from that. So even if we make changes to the shapefiles and issue pull requests, someone else (Nathaniel I think) has to make the real changes to the sources.
I also don't fully understand the versioning process with Natural Earth, why some things are on v2 while others are v3.
...but mostly it's just because I haven't found the time to fully understand it.
@alan seems like pulling the 10m admin1 scale ranks polygon file from natural earth's website & loading it into QGIS with encoding 'utf-8' fixes the issue:
@almccon
Do you have to do anything special when you save it from QGIS?
On Jul 1, 2016, at 16:44, Chris Henrick notifications@github.com wrote:
@alan seems like pulling the 10m admin1 scale ranks polygon file from natural earth's website & loading it into QGIS with encoding 'utf-8' fixes the issue:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Something didn't go right for me when I re-made the natural earth data / imported to postgres. Maybe try using
shp2pgsql
instead ofogr2ogr
?