opengeospatial / wkt

A standalone reference describing the Well-known Text Representation of Geometry. (Work In Progress)
7 stars 1 forks source link

Scope of the Exercise #2

Open pramsey opened 3 years ago

pramsey commented 3 years ago

There's already two WKT specifications in print, the one embedded in the OGC SFSQL from ~1999 and the one in ISO SQL/MM 13249-3:2006.

SFSQL

ISO SQL/MM

The ISO WKT specification has is a clean superset of the SFSQL, so in many ways SFSQL can just be ignored.

That leaves EWKT (and EWKB) that extended SFSQL in the period after SFSQL and before ISO. Note that because ISO chose different mechanisms for extra dimensions it's not a clean superset of EWKT.

EWKT

So, to me an open question is, what is this for? To formalize EWKT and EWKB only? To document all the variations so it is clear what people mean when they say "WKT" or "EWKT"?

When asked to emit EWKT, PostGIS will emit an old-school EWKT string. When asked for the standard ST_AsText() it will emit ISO WKT.

select st_asewkt('SRID=4325;POINT Z (0 0 1)'::geometry);
       st_asewkt        
------------------------
 SRID=4325;POINT(0 0 1)

select st_astext('SRID=4325;POINT Z (0 0 1)'::geometry);
    st_astext    
-----------------
 POINT Z (0 0 1) 

Both have their drawbacks. The consistent dimensional indicator in ISO is nice, but dropping the SRID is not so nice.

pramsey commented 3 years ago

Anyways, the reason I mentioned Wikipedia, not entirely in jest, is because if the effort is to document the state of play and point people where relevant to the official documentation, the first place people land in a search, aka: wikipedia, would not be a bad place for that work.

cholmes commented 3 years ago

So for me the original point of the exercise is to extract 'WKT' to be its own referencable, self-describing specification. I've been contemplating a notion of 'building blocks' in OGC, as small spec-like things that fully explain a small thing, give examples, be easily referencable, etc. They wouldn't try to 'be' the standard, but would make parts of standards more accessible, explaining all you need to make use of it, doing so in a bite-sized chunk that doesn't require understanding everything about OGC.

I had imagined the wkt was a lot simpler than it actually is though, so perhaps this effort is misguided. I am working with OGC staff to actually publish the simple features spec as html, so that we can at least deep link into the key sections. And maybe that's sufficient. Though it does feel worth having at least a 'guide' somewhere to help explain how the various pieces work together.

I got curious about EWKT, as Snowflake and Redshift are using it now, and it does seem like some attempt to standardize it could be useful, but I think anything outside of referring to EWKT and describing what it does is out of scope for this repository. I am generally interested in formalizing EWKT / EWKB / aligning towards a general, sensible spec that handles all the things, but that's much less of a priority for now.

I like the idea of treating wikipedia as the 'landing page' for WKT, especially since it's already well done. But as a user I'd like to be sure that I am getting something a bit more 'definitive', and also something with more examples. One thing that is definitely outside of scope but that I'd love to see is some sort of easy WKT 'validator', that tells you if you did it right, and also what flavor of WKT it is compliant with. Perhaps even renders it so you can see that it looks as you expect.

cholmes commented 3 years ago

3 draft types (polyhedralsurface, triange, tin)

Why are these 'draft types' in EWKT? And is there a reason you left polyhedral surface and tin off from WKT? The SFA pdf spec has those two mentioned, as does wikipedia.

stevage commented 3 years ago

For me, there is value in an easy to read, semi-authoritative reference for the most salient parts of the specification. The place where you go if ever you're writing code to read or write WKT, or otherwise work with it directly.

Ideally (imho) this would look like a friendly domain (wellknowntext.org comes to mind) with a straight-to-the-point design that lays out the key bits of the spec, with pointers to further reading. Sort of like geojson.org used to do before someone removed all the useful content and instead linked to a very hard-to-read RFC.

Currently, Wikipedia does a fine job of summarising it, but I think it's better as a stand-alone site that isn't constrained by the policies and guidelines of Wikipedia, nor is at the whims of its editors. (No criticism of Wikipedia here, I have written plenty of articles over there, including Vector tiles and Web Mercator projection.)

A few design suggestions come to mind:

I'm happy to contribute some labour here. Examples of specs I have worked on in the past: Newline-delimited GeoJSON, Spatial Data Package, Open Council Data, CSV-geo-au, Fiscal Data Package, readable version of GeoJSON spec...

pramsey commented 3 years ago

WRT "draft", at the time we included them as PostGIS types, they existed only in draft discussion papers of either ISO or OGC.

pramsey commented 3 years ago

So, for scope, a "WKT spec" should I think reasonably actually be an "ISO SQL/MM WKT spec", since that covers all the old WKT and all the new features of dimensionality and extra types added by ISO.

That leaves the question of what to do about "extended" since it is not compatible with ISO, and there maybe an annex or something at the end.

I just remembered a long-time WKT nit which I have to look up now: the representation of WKT MULTIPOINT is somewhat variable in terms of whether there are required parentheses around each point, or whether a straight array of coordinates is legal.

MULTIPOINT(0 0, 1 1)
or
MULTIPOINT((0 0), (1 1))
cholmes commented 3 years ago

I'm happy to contribute some labour here.

That'd be awesome! I'm juggling a few different things right now, while also taking some time off.

I fully agree with your goals, and your past work looks great, and more extensive than mine, which is mostly just https://github.com/radiantearth/stac-spec With that one the 'spec' lives fully in github, and has a complementary webpage that helps explain it http://stacspec.org

But with this one the standard is already fully specified, just not accessible. So I agree with the goal of 'a friendly domain with a straight-to-the-point design that lays out the key bits of the spec'. So let's just evolve this repo to be a github pages site. I'll make you a committer and feel free to take it further. We can start with the github.io page sort out a better URL in time.

cholmes commented 3 years ago

So, for scope, a "WKT spec" should I think reasonably actually be an "ISO SQL/MM WKT spec", since that covers all the old WKT and all the new features of dimensionality and extra types added by ISO.

Yeah, that's what I was starting to conclude after reading all you wrote and digging in.

That leaves the question of what to do about "extended" since it is not compatible with ISO, and there maybe an annex or something at the end.

That makes sense. If we're going with a website approach then I think it'd make sense to have EWKT have its own page, that explains what it is / how it works / links out. Perhaps one page explaining it, and then another page that is just a very clean definition of it, that snowflake / redshift / etc can point to, extracting it from the PostGIS docs to just serve as a standalone reference.

I just remembered a long-time WKT nit which I have to look up now: the representation of WKT MULTIPOINT is somewhat variable in terms of whether there are required parentheses around each point, or whether a straight array of coordinates is legal.

Funny, I just noticed that a couple days ago - it's on the Wikipedia example.

I think the scope of this repo should be to just explain well what is there. But I could see pushing towards a rev of the spec to resolve little quirks like that, and to see if we can get SRID incorporated.

stevage commented 3 years ago

I think the scope of this repo should be to just explain well what is there.

Yep, along the lines of: "It is ambiguous in the spec [pxx] whether parentheses are required. These prominent pieces of software support either: ... These others require parentheses. These others require no parentheses..."

edzer commented 3 years ago

Both have their drawbacks. The consistent dimensional indicator in ISO is nice, but dropping the SRID is not so nice.

In definitions like SRID=nnnn;POINT(0 1) the CRS can only be resolved if the table with SRIDs is available, which is true for a database but not true in general. nnnn worked back when this only meant EPSG:nnnn, but today there are more authorities, like ESRI:nnnn, OGC:nnnn and so on; also some don't have a numeric nnnn like OGC:CRS84. I can see this was useful, but wouldn't propagate it as current best practice, also given that

I wouldn't recommending using it in new text-based formats.