sul-dlss / gis-robot-suite

Robots for GIS accessioning and delivery
Other
9 stars 4 forks source link

support for non latin1/utf8 encodings in geoserver #59

Open drh-stanford opened 9 years ago

drh-stanford commented 9 years ago

example: ty249zp2774 has a chinese encoding but geoserver shows inspection results as utf8/latin1

see https://github.com/sul-dlss/gis-robot-suite/blob/9b6179d4780ea63068cd40f8841bc7f6bb59b5c7/robots/gisDelivery/load-vector.rb#L82-L87

drh-stanford commented 9 years ago

For example:

https://earthworks.stanford.edu/catalog/stanford-ty249zp2774

inspection shows wrong character encoding...

screen shot 2015-07-08 at 12 44 23 pm

drh-stanford commented 9 years ago

Another example is yq395kh3847

screen shot 2015-07-09 at 10 58 05 am

drh-stanford commented 6 years ago

some shapefiles come with .cpg files that will have the character encoding. the data appear corrupted in the PostGIS database so the shp2pgsql is where we need to set the encoding. We currently try UTF8 and then LATIN1.

drh-stanford commented 6 years ago

FYI - The yq395kh3847 example is encoded in the BIG5 encoding. If you load it into QGIS with BIG5, then the characters show up.

screen shot 2018-02-12 at 3 49 19 pm

drh-stanford commented 6 years ago

using shp2pgsql -W BIG5 fixes this particular layer -- the problem is that we don't know the character encoding, in general. the .dbf file is just binary data and you have to try different encodings until it works (visual inspection).

screen shot 2018-02-12 at 4 00 20 pm