Open bosborn opened 4 years ago
This comment applies here as well.
The shadow view approach is complicated by the presence of multiple foreign keys that point to gpkg_contents. It will take some time to produce a viable example but I hope to do so during TB-16 work.
The shadow view w/ triggers approach as the new "standard" specification could result in a fair amount of headaches.
The above proposal, however, is to update the base specification gpkg_contents table with an id column, causing little to no backwards compatibility issues.
I've gotten strong feedback from other stakeholders that adding new columns is to be avoided or we would have done this all over the place for the last five years.
Understood, however this is a very special case due to the rowid alias already existing as explained above.
We discussed this during the SWG meeting today and there was general agreement that adding a column to this table would be resisted by the GeoPackage community because of risks to interoperability. However, we do want to know more about what is driving this requirement. What is your use case for needing a robust numerical ID for GeoPackage contents?
The Related Tables Extension requires integer ids for relationships. The NGA Contents Id Extension is a work around to this issue.
The only way to foreign key to a Contents through a non changing identifier is by name. GeoPackage's own Related Tables Extension requires integer id's as foreign keys and is not compatible with Contents level relationships.
String keys in database design are generally a bad idea in order to avoid primary keys that may need to change over the lifetime of the data. It is a little late for that now, but the suggested column is already a supported due to the SQLite rowid. So again, the risks are fairly low compared to adding a new column that is not already inherently supported.
What is the use case where a Related Tables Extension relationship needs to be made between GeoPackage contents and related content? The Metadata Extension was designed to support this case.
Any extension that wants to relate data on a Contents wide level in addition to a Contents data row level nature. The Metadata Extension suffers the same issue as the foreign key to an individual Contents would still be by table name, and not the metadata id. The Metadata extension is an added unnecessary layer when an extension can better define its' data and simply needs to reference / foreign key to a Contents. The Related Tables Extension is an already approved and supported way of performing this relationship, but requires integer ids.
The Metadata Extension suffers the same issue
This is false. gpkg_metadata_reference
has a reference_scope
column which may be "table", in which case table_name
is as expected and row_id_value
is null. There is actually less overhead to the Metadata Extension than the Related Tables Extension.
Any extension that wants to relate data on a Contents wide level in addition to a Contents data row level nature.
When does this happen? We (SWG members) are not aware of a real-world scenario that calls for this design.
The Metadata Extension suffers the same issue
As in the foreign key is still to a String table_name value and not an integer. This seems true and confirmed by your comment.
When does this happen? We (SWG members) are not aware of a real-world scenario that calls for this design.
In our Contents Id and Feature Style extensions. http://ngageoint.github.io/GeoPackage/docs/extensions/
Contents Id you and I discussed last year. Styles minimally over twitter which uses Contents Id.
July 24th, 2019
Jeff Yutzler 1:14 PM Regarding http://ngageoint.github.io/GeoPackage/docs/extensions/contents-id.html IIUC, you don't need this because you can always use the invisible rowid column if the table lacks an integer primary key
Brian Osborn 1:19 PM oh wow nice, i didn’t even know about that i can probably phase that one out then
October 18th, 2019
Brian Osborn 1:28 PM was looking into replacing our contents id extension. it doesn’t look like the rowid can be relied on “If the rowid is not aliased by INTEGER PRIMARY KEY then it is not persistent and might change. In particular the VACUUM command will change rowids for tables that do not declare an INTEGER PRIMARY KEY. Therefore, applications should not normally access the rowid directly, but instead use an INTEGER PRIMARY KEY.” https://www.sqlite.org/rowidtable.html October 19th, 2019
Jeff Yutzler 6:36 PM grrr... October 21st, 2019
Jeff Yutzler 8:04 AM I've thought about this a bit and it strikes me as a bit of an edge case. These rowids will only change when rows are removed (how often does that happen to gpkg_contents?) and the database is vacuumed. This problem would be caught by an executable test and is quickly repaired. Do what you want, but I would recommend not worrying about it.
Brian Osborn 8:06 AM About to fly, but yeah I thought about maybe even adding extra vacuum behaviors to handle it. But I know other groups including sofwerx that raw vacuum, and then the database is invalid. We do vacuums as well. Sqlite databases get bloated over time and it's the only way to reclaim memory. And that documentation really states to not rely on it, as if there could be other things either now or in the future. In hindsight the contents would have an actual ID column. Which would have made renaming things and foreign relationships easier to maintain If that info was lost in a vacuum there would be no way to fix those relationships
Jeff Yutzler 9:38 AM I do regret that the initial implementation did not have an actual ID column. It must have seemed like a good idea at the time...
I don't get your point regarding the Metadata Extension. There is no inherent flaw with the Metadata extension's use of table_name as a key. These keys cannot break like rowids can.
Portrayal is a much more complicated case. Work in the Vector Tiles Pilot and other activities has demonstrated that a simple style extension like you proposed is insufficient for broad use. There are too many holes. However, the VTP2 ER does not consider the potential brittleness of rowids because the test case did not feature arbitrary adding and deleting of gpkg_contents rows combined with vacuuming. (It is hard to image this happening in the real world.) Nevertheless, this is something we can look into during Testbed-16.
Do you have any other potential cases where you require a robust integer primary key in gpkg_contents?
We can move past the Metadata Extension as that went off topic. There is nothing wrong with it. You had mentioned it, and I was simply pointing out that it also requires a table name foreign key (similar to contents). Usage of that Metadata Extension does not always make the most sense for some extensions.
VTP2 appears to foreign key to the table names as well, so row ids should not be an issue there.
I can assure you that deleting contents in a GeoPackage combined with vacuuming is a very real world operation, more so than assuming it would not happen. I realize that it becomes hard for each individual or organization to think broadly outside of their own use cases. A specification working group should try to keep all possibilities and use cases in mind for a global specification.
Whether our Styles extension is deemed insufficient by OGC for broad use should have no sway in core specification design. Our Styles extension is authored by our organization and has proven highly successful in operational environments. The years without an official GeoPackage style spec forced that need.
I currently don't have additional potential use cases, but I imagine some may arise in the future. In hindsight, the GeoPackage specification should have included an integer id column, a table name column, and a display name column. Only the integer id column should have been used as a foreign key throughout the specification. A display name and/or table rename would have been possible through a single contents row column update. Good database design principles encourage primary keys that have no chance of changing. The GeoPackage spec instead has made table names / data layer names fairly permanent without modifying throughout the entire database.
Getting back on track... this proposal is to provide a mechanism to add a proper reliable contents integer identifier in a backwards and forwards compatible manner. The need can justifiably be argued on proper database and specification design principles. If the SWG's response is no, then this issue can be closed.
In Summary:
I agree with the opinion of @bosborn. Reliable ID integer primary keys that identify the rows as he recommends are useful to increase the number of potential use cases of the GeoPackage specification. For example, we can imagine a versioning extension that documents the evolution of the database schema (e.g. renaming a feature, adding attribute tables, evolving from version 1 to version 2). With an ID in Contents it is easy to develop it: you may track the changes in the row with such ID, or if this ID is a primary key with AUTOINCREMENT you can save the contents of the old row and restore it when required. Without the ID, the development of such extension is more difficult as you should develop it considering corner cases related to the recycling of table names.
We're in violent agreement that mistakes were made 6-8 years ago. The question is twofold:
Either way, this proposal is not on the table for 1.3. You will have to stick with your join table for the time being. This is not the worst thing in the world.
Sounds good. For number 2, take a look at our Contents Id extension in case OGC would like to adopt it as a solution. It does nothing more than associate an integer id to a contents table name. http://ngageoint.github.io/GeoPackage/docs/extensions/contents-id.html
@bosborn May just adding a last id INTEGER UNIQUE
column to gpkg_contents
provide an equivalent behaviour without risking interoperability?
https://sqlite.org/faq.html#q26 explains that uniqueness check in SQLite is performed only against non-null values. So, adding this may solve this issue without breaking existing systems. Legacy software can use it as code does not require changes because the new column is filled with the default value, i.e. null
. Nevertheless, I wonder if some software can complaint of having an additional id
column in the result of select * from gpkg_contents
, but, IMHO, this should not break a system. New software can use this id
column to provide values to be used for example in RTE. This provides same functionality as Contents Id but the design is similar to the WKT extension.
create table t1(name string primary key not null, id integer unique);
insert into t1(name) values ("feature_1");
insert into t1(name) values ("feature_2");
insert into t1(name, id) values ("feature_3", 1);
insert into t1(name, id) values ("feature_4", 2);
-- insert into t1(name) values ("feature_4");
-- Error: UNIQUE constraint failed: t1.name
-- insert into t1(name, id) values ("feature_5", 2);
-- Error: UNIQUE constraint failed: t1.id
select * from t1;
name | id |
---|---|
feature_1 | null |
feature_2 | null |
feature_3 | 1 |
feature_4 | 2 |
Yes, that would be an acceptable solution as well.
I think the SWG's goal is to try and avoid all new columns (from comment above and #230), so I'm not sure it can be done.
I do understand that dealing with optional core spec columns is annoying after personally experiencing it with the WKT extension (#254 & #255). It is tough trying to improve the spec while maintaining compatibility. One potential solution could be migration scripts, which is similar to how I have maintained operational databases in the past. It could preserve backwards compatibility in the same way you listed while also allowing maintained implementations to upgrade GeoPackages.
@fjlopez it is not legal to use ALTER TABLE ADD COLUMN to add a row with a primary key or unique constraint so a migration script would be needed either way. https://sqlite.org/lang_altertable.html
A change in the schema of a database always require a migration script even if the script only contains only one DDL statement. Migration scripts such as the included in the proposal of @bosborn https://github.com/opengeospatial/geopackage/issues/516#issue-605879696 are usual in the database world and there are tooling to support them (e.g. Flyway).
However, I understand your concern. SQL migration scripts are a headache for many users, specially users that are GIS end users.
I have a worry which is a bit off-topic but I would like to know your opinion as I think that it is very related to this. I start to believe that a root cause of the stiff opposition to schema evolution, even when the issues are clear, is the lack of a command line tool in our ecosystem able to automatise versions upgrades and other schema management issues, and really easy to use by the GIS community (similar to GDAL command line programs). E.g., a tool something like:
$ geopackage info test.gpkg
Database: test.gpkg
Current version: GeoPackage 1.1.0
Extensions: Geometry Type Triggers
$ geopackage upgrade 1.3.0 test.gpkg
Validate migration
Extension Geometry Type Triggers not supported in 1.3.0
Upgrade cannot be performed, use upgrade --force to upgrade and remove
$ geopackage upgrade --force 1.3.0 test.gpkg
Validate migration
Successfully migration to GeoPackage 1.3.0
$ geopackage info test.gpkg
Database: test.gpkg
Current version: GeoPackage 1.3.0
Is there such a command line tool? I.e., something like the command line version of https://github.com/ngageoint/geopackage-java but with steroids.
@fjlopez There is just the Executable Test Suite which can be run on the command line. However, the ETS doesn't do a fraction of what we want it to do. It doesn't even support warnings at this time.
I don't have any answer for this. The software you are talking about will not fall from the sky or grow on trees. OGC does not have the resources to fund this development. Software organizations (you mention GDAL but there are others) have their own tools but they aren't vetted to be used in an official capacity. I don't see this changing unless a deep-pocketed organization decides it is necessary.
Regarding migration script, what I meant is that it would require a script to rename the table, create the new table, and copy the data over, not just add a new column with a single line like you could in most RDBMSs.
Tagging this as Tile Matrix Set
because the shadow view approach will be considered as part of the TMS support development as shadow views will be required to handle the other aspects of that extension.
This is a proposal (with low breaking risk to existing GeoPackages) to update the gpkg_contents table with an integer id primary key column.
The SQLite rowid column is not a reliable foreign key:
VACUUM is a common function call used to reduce a GeoPackage memory footprint after modifications. Any foreign key to the contents
rowid
is not a guarantee.The Related Tables Extension requires integer ids for relationships. The NGA Contents Id Extension is a work around to this issue.
Proposal:
Add an INTEGER id primary key to the
gpkg_contents
table and change thetable_name
column to be unique.Justification:
rowid
column, or migrate GeoPackages to new contents tableid
column can be migrated to the new contents table with preservedrowid
/id
valuesgpkg_geometry_columns
,gpkg_tile_matrix_set
, andgpkg_tile_matrix
tables continue to foreign key to thetable_name
column, avoiding a breaking change.Migration SQL:
See ALTER TABLE: Making Other Kinds Of Table Schema Changes