sofwerx / cdb2-concept

CDB modernization
0 stars 1 forks source link

Thoughts on Shapefiles in CDB #6

Closed christianmorrow closed 3 years ago

christianmorrow commented 3 years ago

High Level Objective: CDB 2.x should not use obsolete technology

The shapefile format, introduced in the early 1990s, is obsolete. A modern geospatial storage scheme should not rely on obsolete technology. There are a variety of suitable, modern alternatives to shapefiles.

Shapefile limits on file size (i.e. number of records), field name length, and data types are well known. Shapefiles' flat structure requires loading, indexing, and retrieval optimization that is external to the data.

A replacement for shapefiles should:

Perhaps these matters have already been settled in and among the CDB 2.x concept participants -- my input on the subject is simply this - shapefiles are obsolete.

ryanfranz commented 3 years ago

GeoPackage will be in CDB 1.2, so I don't think this is an issue. The thing that CDB 2.0 needs to resolve is how to merge vectors together into larger packages, and if we are going to maintain versioning across multiple CDB 2 instances, how do we do that.

christianmorrow commented 3 years ago

My comments pertain to the container itself rather than the derivative issues of tiling, editing, maintenance, etc. When contemplating a new, 21st century geospatial data storage scheme, however, can't we all agree that 'shapefile' is a thing of the past?

PresagisHermann commented 3 years ago

For vector encoding, I believe there is vast consensus to replace shapefile by a more modern format. Your list of requirements Chris hits all the important points in my view. There is convergence toward geopackage which covers more than just vector encoding such that its is not a one for one replacement (as was experimented in the CDB geopackage activity) but covers LOD, tiling etc...

tabinfl commented 3 years ago

Reached consensus during week-long workshop

cnreediii commented 3 years ago

Not sure if totally closed :-) Version 2.0 should not proscribe (mandate) that GeoPackage is the only allowed vector encoding/container. I believe that when stating requirements and recommendations, that CDB 2.0 should recommend the use of GeoPackage for vector data but that other vector encodings are allowed. In CDB 1.2, the SWG approved and crafted wording for enumerating all the various encodings that are currently supported/used in a CDB data store. This approach is extensible and should be carried forward into CDB 2.0. FYI, in the OGC standards world, there is a document The OGC Specification Model - A Standard for Modular specifications (08-131r3). This document very clearly defines what is meant by the concepts "requirement", "recommendation", "conformance", and much more. My post is more about clarification rather than re-opening the issue.

PresagisHermann commented 3 years ago

Hi Carl, Does this means that CDB 2.0 will not be able to take advantage of new capabilities in Geopackage, forcing that all new CDB 2.0 concepts are possible to encode in shapefile? One simple example is length of attribute names but there is so much more as geopackage in CDB 2.0 will not be a one-for one encoding replacement for shapefile. I am of the opinion that we should remove shapefile altogether because it is not a one-for-one replacement. 3D models are different - if we stick to one-for one replacement, it is more manageable to support both encoding.

Then again, it depends if we talk conceptual model, logical model or physical model!

cnreediii commented 3 years ago

I am thinking right now at the conceptual/logical modelling levels. I am also thinking:

  1. How to provide some level of backwards compatibility to enable a consistent and $$ viable migration path for the existing CDB user/customer base. If this can be done with a CDB 2.0 profile, then Shapefiles may still need to be able to be used. This is the tension with evolving any technology while at the same time being cognizant of the investments in legacy systems - and the willingness of the legacy system users to move to the new architecture/platform.

  2. What if we wish to add other vector encodings - such as GeoJSON - or some future encoding? Or what if GeoPackage falls over at some point and does not meet various use cases or performance requirements?

  3. In the OGC over the years we have learned that "lock in" to one specific technology or approach ends up having serious drawbacks.

I may be totally wrong . . . but . . .

kevinbentley commented 3 years ago

This thread should probably be reopened...

I think that not utilizing the new capabilities in GeoPackage (and other vector databases) would be a shame. What I would like to see is for CDB to support any vector container that supports Part 2 of the SFA (https://www.iso.org/standard/40115.html). That would include GeoPackage and enterprise capabilities like PostGIS or SQLServer. If a user wanted to go from GeoPackage to SQLServer, it's relatively easy to move the data since they both support the same operations.

cnreediii commented 3 years ago

@kevinbentley Indeed! And most (all?) commercial DB software supports SFA (Microsoft, Oracle, IBM). Also, R and many other application packages. Kevin's point underscores the idea of CDB 2.0 stating that SFA is the conceptual/logical model for any vector data store is solid, keeps the door open for future technologies, and more. And GeoPackage, GeoJSON (almost), and Shapefiles are all consistent with SFA. This approach would allow CDB 2.0 to abstract away from a specific technology (such as GeoPackage) and provide a solid foundation for the long term evolution and extension and profiling of CDB 2.0.

ryanfranz commented 3 years ago

I assume that basing CDB on SFA is for the conceptual/logical model (seems like a great idea). At some level (standard/profile/etc) I would assume that we want to recommend one or more concrete encodings. Am I correct on this assumption?

cnreediii commented 3 years ago

Yes, this your assumption is correct.

kevinbentley commented 3 years ago

Just to clarify, I think we should require not just SFA, but part 2 of the SFA, which includes queries and geometric operations. GeoPackage would meet the part 2 requirements, but Shapefile and GeoJSON would not.

ryanfranz commented 3 years ago

Adding part 2 seems to be repeating some of the early mistakes in CDB where restrictions were put on how the data was "supposed" to be used. This requirement would say that your implementation must be able to do certain operations, which is primarily software and not format. If someone wanted to put a SQL API on top of shapefile, seems like that would fit the definition, but maybe not the spirit of SFA part 2. It also implies that one person can have a valid CDB with shapefiles, but another user sees that same CDB as invalid based on not having the same software packages.

kevinbentley commented 3 years ago

@ryanfranz I understand where you're coming from about software vs. format. The difference in my mind is that part 2 defines an interface that the software must implement, just like the rest of the SFA. If I want to write software that uses the benefits of a geospatial database (e.g. SQL queries), I either need to either implement all of the part 2 functionality myself just in case I get CDB in a shapefile or some other container.

Maybe it's too much to ask for SFA part 2 as a CDB container requirement. It sounds like we might all be able to agree on core SFA, so that's progress. If I know that part 2 can be implemented as a software layer around all possible CDB formats because they support SFA objects, that's an improvement.

cnreediii commented 3 years ago

@kevinbentley - I have not read everything closely yet but I think we need to separate geometry from operations. These are two very different beasts. The Simple Features Geometry model is standard alone. The operations and methods are also stand alone and could be implement against any data store. If one looks at all of the OGC API work, API - Features (for example) could be used quite effectively with a data store comprised of Shapefiles. Separation of form (geometry) and function (methods). API - Features will eventually define most if not all of the SFA Part 2 capabilities. I believe that you could do this already with WPS - which I think was done in a recent OGC Interop Initiative. I will check. WPS accessing a Shapefile store and transforming on the fly into a GeoPackage. I think I have that correct :-)