Closed achapkowski closed 6 months ago
add support wkt or wkt2 formats
Which version? There are several (of each).
Do you mean any version? If so, then you've just imposed all of the (varied, inconsistent, and incompatible) history of WKT onto every implementer of the format.
The case for PROJJSON is very clear:
It is a huge deficiency that the geospatial standards community doesn't have a JSON-based CRS format. The impedance caused by the content of WKT not being expressed in any common grammar has been a huge gate-keeping industry deficiency for decades. The OGC CRS SWG is planning to start with PROJJSON to make a CRSJSON, but who knows what that will devolve into. PROJJSON, however, exists, can have its syntax validated with common tools, and can be conveniently parsed.
#wktrantoff
More discussion on PROJJSON was had in https://github.com/opengeospatial/geoparquet/discussions/90 and https://github.com/opengeospatial/geoparquet/pull/96
PROJJSON has no Java implementation or Java binding. This becomes a blocker to Apache Sedona or any big data ecosystem that are in Java / Scala world such as HBase, Trino, Hive and so on
Currently, we have no way to parse or understand PROJJSON but we can understand CRS WKT using GeoTools.
PROJJSON has no Java implementation or Java binding
If it is not already available in it, it shouldn' hopefully be too hard to add to https://github.com/OSGeo/PROJ-JNI which is a JNI binding of PROJ.
Otherwise https://github.com/rouault/projjson_to_wkt could be quickly ported to Java to convert PROJJSON to WKT2 (@m-mohr ported it to JavaScript), but I'm not sure GeoTools understands WKT2. There might be in progress work regarding WKT2:2019 support in https://github.com/apache/sis
If it is not already available in it, it shouldn' hopefully be too hard to add to https://github.com/OSGeo/PROJ-JNI which is a JNI binding of PROJ.
well, I was forgetting that you could also use the GDAL JNI bindings to convert PROJJSON to WKT1 using https://gdal.org/java/org/gdal/osr/SpatialReference.html#SetFromUserInput(java.lang.String) to import PROJJSON and https://gdal.org/java/org/gdal/osr/SpatialReference.html#ExportToWkt() to export to WKT, using PROJ underneath. Of course that's a bit of a heavy dependency
JNI is a non-starter for many Java libraries in the big data ecosystem, let alone PROJ via JNI. For PROJJSON to be a possibility in that ecosystem somebody would probably need to step up and do the implementation work in Java (as Even noted, it might be not be too difficult and there is some readily available prior art to draw from).
In the absence of that, excluding an entire ecosystem seems worse than allowing a widely supported CRS representation into our metadata.
The conversion work from Python to JS was 1 hour of work with ChatGPT. It's likely not much more in Java. If that's too hard to do, then the ecosystem doesn't really seem to want it, I'd say?
Anyway, if we add other encodings, please only additive, not instead of PROJJSON. Otherwise you also exclude non-WKT2 supporting ecosystems again.
Also, can we clarify whether Java supports WKT1 or 2? That's quite a difference...
Also, can we clarify whether Java supports WKT1 or 2? That's quite a difference...
I believe GeoTools supports WKT1 only AFAIK: https://docs.geotools.org/stable/javadocs/org/geotools/api/referencing/doc-files/WKT.html Apache SIS supports WKT2:2015 (and WKT1), with in-progress work to add WKT2:2019.
Thanks guys for the help. So I guess the solution for us is:
But this just solves the reading projjson
problem. How about writing a WKT1 / WKT2 string to projjson?
- It converts projjson string <> WKT1/WKT2:2019
projjson_to_wkt has this important warning "Warning: while the export to WKT1 should be syntaxically correct, datum, projection method or parameter names will be the one of WKT2, and thus a number of implementations will in practice fail to understand such WKT1 strings."
Not everyone uses projjson or the associated tools. Many people are in the ArcGIS space.
Many people are in the ArcGIS space.
https://www.esri.com/content/dam/esrisites/en-us/media/legal/open-source-acknowledgements/arcgis-pro-3-3-open-source-disclosure.zip has a ArcGIS Pro 3_3 Open Source Disclosure.xlsx file mentioning a "proj_gdal_e.dll" file. Time to make active use of it ;-)
Anyway, if we add other encodings, please only additive, not instead of PROJJSON.
Concretely, would this mean that certain geoparquet readers couldn't read certain geoparquet files, if the reader doesn't happen to implement projjson support? I'd worry about that causing an (IMO unnecessary) schism and confusing users and data providers.
Yeah, if the other encodings are not additive. That makes it more difficult for writers though, but I feel like ease of reading is more important than ease of writing?
Ideally everyone would support PROJJSON though.
Not everyone uses projjson or the associated tools. Many people are in the ArcGIS space.
It is easy to install and use PROJ from an ArcPro Conda environment. It works quite well.
Concretely, would this mean that certain geoparquet readers couldn't read certain geoparquet files, if the reader doesn't happen to implement projjson support?
If the specification allows multiple flavors of CRS, most writers will chose vanilla – raw EPSG codes. That means readers will have to go somewhere else to get the parameters those codes describe. Or they will always use the one code that everyone knows and can describe by heart, 4326 😄
The case against PROJJSON so far is:
What's missing here is these languages don't have a complete open source implementation of the data model that describes WKT2, which is published in ISO 19162 and OGC 18-010. They're missing because writing one is a ton of detailed, thankless work to implement a complex and necessarily complicated data model. PROJJSON is a very faithful expression of that model in JSON, and @rouault found many interpretation nits and bugs in the specification as he built PROJJSON because of its complexity.
Maybe a transpile of the full PROJ engine to WASM is within reach. Maybe Apache SIS has a full 19162 model ready to go but just needs the PROJJSON i/o built for it. I don't have the answers here, but it seems to me users in those software ecosystems need to strengthen their capabilities to meet the requirement regardless of whether or not geoparquet requires PROJJSON or allows every flavor of WKT to describe the coordinate system of data.
PROJJSON is advantageous because it can meet data readers half way – if users have a full interpretation engine they can use it. If they don't, they can pluck the keys and codes that they know about without writing a custom parser and interpretation engine.
Not everyone uses projjson or the associated tools. Many people are in the ArcGIS space.
It is easy to install and use PROJ from an ArcPro Conda environment. It works quite well.
You obviously never worked in closed secure environments. Not everyone can pip or conda install stuff.
Not everyone uses projjson or the associated tools. Many people are in the ArcGIS space.
It is easy to install and use PROJ from an ArcPro Conda environment. It works quite well.
You obviously never worked in closed secure environments. Not everyone can pip or conda install stuff.
https://anaconda.org/esri/proj4 it seems like Esri is already explicitly supporting PROJ usage?
Anyway, I do not see "Esri doesn't support it (yet)" as a valid argument against it.
PROJJSON is advantageous because it can meet data readers half way – if users have a full interpretation engine they can use it. If they don't, they can pluck the keys and codes that they know about without writing a custom parser and interpretation engine.
I think this is an important point that @hobu makes. We actually have an example of that in the spec specifically for OGC:CRS84 (https://github.com/opengeospatial/geoparquet/blob/v1.0.0/format-specs/geoparquet.md#ogccrs84-details), but I think that should apply more in general (with the only requirement that the files were created by a writer that includes those codes).
Since proj supports multiple formats. https://proj.org/en/9.4/faq.html
I don't understand why people are being stubborn about the format.
I don't understand why people are being stubborn about the format.
Because writing a specification that diverse implementation audiences can succeed with is very difficult. Most of the non-geo software world has no clue what WKT is or knows how to dereference an EPSG code into a coordinate system and they don't ever care to. Geoparquet aspires a much wider audience than the spatial-is-special crowd, and it needs implementation buy-in in these other communities to get traction beyond it. Larding up the specification with conveniences like allowing many different coordinate system description formats makes it harder to provide complete implementations and increases the interoperability leakage between those implementations.
I would argue that the spatial-is-special world's two most impactful specifications, Shapefile and GeoJSON, could attribute a lot of their market penetration to the fact they don't provide much guidance in regard to coordinate systems. By not imposing that complexity on implementers, they focused on the part of the interoperability that matters – the geometries. I argue the same thirst exists in the communities that would also implement geoparquet.
@achapkowski while you're active in the open source community, mind getting someone at ESRI to comment on https://github.com/OSGeo/gdal/pull/9980 ? Having a open driver for this format benefits everyone, ESRI included. 👍
Neither does Rust
@hobu We (the georust greater co-prosperity sphere) have good bindings to libproj if it can be used. And if it can't, we'll write a native implementation.
Great discussion everyone - I think I'm going to close this issue soon as we discussed extensively before 1.0, and I think we've gone over most of the points again. I think we can all acknowledge that our choice of PROJJSON was our most 'controversial' choice in the specification, but I don't think we'll revisit that until a '2.0' version of GeoParquet.
And having 'multiple' options (PROJJSON plus WKT2 for example) that impose higher requirements on readers, forcing them to understand both dialects if they want to read any possible GeoParquet format, is not something desired for GeoParquet. Philosophically this is not in line with the choices we've made for this format - we want to make it as easy as possible for implementations to be created without a deep stack of geospatial software behind it.
I do think we should continue to work to encourage and even find funding for software that does not yet understand PROJJSON, especially open source implementations. And I will state that we actively want ESRI to implement GeoParquet fully, and the stubbornness on this particular issue is in service of greater interoperability. But until it's fully implemented it seems fine to me for ESRI to just support lat/long, or to use 'most' of GeoParquet and do their own crs metadata that is WKT2 as a bridge.
Ping Apache SIS core developer @desruisseaux since he is much more knowledgable than me on this 😁:
Is PROJJSON support on Apache SIS's roadmap?
Chris, please feel free to close the issue since this is off the topic :-)
Even is correct, Apache SIS supports WKT 1 and WKT 2:2015 (it was the first open source software to support WKT 2 after the ESRI prototype) with work for WKT 2:2019 in progress right now. It also supports GML, which is currently the only format capable to support fully the ISO 19111:2007 model. If I understood correctly, PROJJSON doesn't cover fully the ISO 19111 model yet, which is one reason why OGC wants to review it before to approve a JSON format. If we want CRSJSON to be a replacement for GML, then it should be at least as capable as GML.
I plan to support OGC CRSJSON in Apache SIS when the specification will be advanced enough. Whether SIS will support PROJJSON will depend on whether there is a lot of differences. Note that the OGC CRS working group has explicitly stated in their charter that they will avoid any unnecessary difference with PROJJSON.
One correction to what has been said in a previous comment: WKT 2 is not a data model. The model is ISO 19111, and WKT is an encoding of that model. Libraries do not implement a WKT model. They implement ISO 19111, then establish a mapping from WKT elements to that model. This is what both Apache SIS and PROJ C++ API do. One reason for the WKT complexity is that its mapping to ISO 19111 is not straightforward, as WKT makes compromises in an attempt to be more compact and for backward compatibility. The consequence is that trying to understand WKT without prior knowledge of ISO 19111 is confusing. For understanding WKT, ISO 19111 must be read first. If a JSON encoding does a more direct mapping to ISO 19111 elements, it may help to reduce that confusion.
The CRS standardization effort at OGC is lead mainly by Roger Lott. My experience in working with him for more than 10 years is that he is very reliable. When he said that he will do something, he really does, and he is much, much better than me in following the roadmap.
Maybe a transpile of the full PROJ engine to WASM is within reach
That would be great. There is already a version of GDAL https://github.com/bugra9/gdal3.js that includes PROJ, so it should be easy to "extract" only the PROJ needed part. The only missing part (but not completely mandatory) is the cURL integration to use the grid files from https://cdn.proj.org Unfortunately this issue is not moving forward: https://github.com/emscripten-core/emscripten/issues/3270
add support wkt or wkt2 formats for crs to provide more robustness for clients who get lots of varied data.