Open nickpeihl opened 4 years ago
Hi @nickpeihl, thanks for posting the POC! I like where this is headed, and thanks for the read into related Mapshaper workflow options.
It's slightly more complicated for the admin0 "level" files because they are themselves derived from an OpenOffice file that includes a few field calculations for backfilling "unknowns" :\
Another thought I've had is now that QGIS supports editing GeoJSON... The current GeoJSON exports could become master and the SHP could instead derive from them. I've had good success in Who's On First doing line delim properties (there's a generic Python exportify script that could be used as starting point) and then smooshing all the geometry into a single line. So it's very easy in Github to look at the property diff, and geometry diff can be viewed using their visual diff tool. We'd need to have a think about if the admin0 and admin0 properties could also be stored in GeoJSON or if the CSV approach you POC'd is better.
I'm in the midst of some COVID changes so don't have a change to look at this in depth likely for another week.
This issue was sparked by this comment.
Currently, attributes for (all?) shapefiles are stored in git as binary DBF formats in the
./housekeeping
directory The attributes are joined to geometries to create shapefiles via mapshaper commands in the Makefile (example). Unfortunately, using binary files to store attributes makes it impossible to see diffs and thusly much harder to QA pull requests.So I propose storing the attribute data in CSV files which can be easily diffed and QA'd.
Unfortunately, unlike DBF files, CSV files do not store field types. So if mapshaper tries to join a CSV file it tries to guess the field type based on the data which may result in unwanted field types in the output shapefiles (Integer where String is appropriate).
Mapshaper does have
field-types
andstring-fields
parameters onjoin
, but they only support two types of fields:str
(String) andnum
(Number).GDAL has a concept of
*.csvt
files which contains the OGRFieldType for CSV files. But mapshaper does not supportcsvt
. The CVST file can be created from the DBF file using GDAL (example:ogr2ogr -f CSV -lco CREATE_CSVT=YES ./housekeeping/ne_admin_0_details_level_5_disputed.csv ./housekeeping/ne_admin_0_details_level_5_disputed.dbf
)One possible solution is to create an intermediary DBF file using
ogr2ogr
from the CSV file, then use that intermediary DBF file in the mapshaper join command.I've created a proof of concept and I'm happy to discuss further or create a PR.