tmcw / simpleopendata

simple guidelines for publishing open data in useful formats
https://simpleopendata.com/
89 stars 18 forks source link

GeoJSON and CRS clarification #13

Closed alesarrett closed 10 years ago

alesarrett commented 10 years ago

In the section describing geographic information hints it is said that GeoJSON requires WGS84 but it is only the default and it is possible to assign various CRSs.

tmcw commented 10 years ago

The exact text is:

For small vector data, use GeoJSON or KML. These are simple, widely-adopted standards. Remember that these formats expect geographic coordinates in the WGS84 datum, which is easier to use for data consumers: so reproject before publishing.

I'm not sure how this should be rephrased, if it should be - despite CRS being in the spec, projected GeoJSON is very rare in the wild and in implementations, and is not something I'm going to recommend people generate.

scw commented 10 years ago

I'd probably change the sentence following the quote above to read "For larger vector data or projected data, publish as Shapefiles." Reprojection may add value to the web mapping world and basic data consumption, but it fundamentally alters what the data represents. That's probably fine for broad-scale mapping where the vertices are inexact, but inappropriate for things which have precision baked in to the data (e.g. parcel data).

tmcw commented 10 years ago

I'd rather not equate projected data with accuracy, which is commonly done, until there's some real research and documentation on numerical stability of reprojection.

scw commented 10 years ago

There is a good amount of literature on when projections are necessary, particularly in the geodesy community. They often provide details on the forward / inverse projections of a dataset and their numerical robustness, but the specifics vary both by internal precision in the computations (as implemented in, say, PROJ.4) and in the specific projections, which have differing mathematical transformations depending on the nature of the origin projection (and some of them require numerical integration and other approximation techniques to estimate).

talllguy commented 10 years ago

This brings up an interesting point. If there is not an exact mathematical transformation between data projection and EPSG:4326, then metadata governing the accuracy of said data will become invalid if it is reprojected. Depending on the data use, the effect of this may be irrelevant or major, especially if the source data is survey grade.

I just attended a conference where an engineer explained that the accuracy of data and its projection is very relavant, even if it isn't survey grade. For instance, benchmark data projection error adds hours of work to surveyors jobs if the transformation is just a few meters off.

My advice is to recommend WGS84 but provide projected data with a .prj file (especially for ESRI projections) as well if accuracy is important.

tmcw commented 10 years ago

@scw are there any online resources like the ones you mention?

My feeling is that the best compromise would be basically:

Remember that these formats expect geographic coordinates in the WGS84 datum, which is easier to use for data consumers: so reproject before publishing.

becomes

Remember that these formats expect geographic coordinates in the WGS84 datum, which is easier to use for data consumers: so reproject before publishing. If you have high-accuracy projected data, provide a native-projection version for accuracy and a WGS84 version for ease of use.

scw commented 10 years ago

@tmcw I don't have a bibliography built out on this specific topic, but as an example, Charles Karney has a paper on a high precision transform for transverse mercator which is implemented in GeographicLib and documented in a paper (PDF) It doesn't describe the general problem, but does provide the mathematical foundations for doing high precision work for one projection universally.

C. F. F. Karney, Transverse Mercator with an accuracy of a few nanometers , J. Geodesy 85(8), 475–485 (Aug. 2011)

I think having both available is a reasonable solution. In general, I think open data should be in part about making data available in a variety of formats, and which someone chooses depends on fitness of use and their own tools.

talllguy commented 10 years ago

@tmcw I like your words that add the bit about high accuracy projections!

@scw Great point about the variety of formats.