rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
543 stars 90 forks source link

writeVector(insert, overwrite) has useful, but undocumented, GeoPackage behavior #1573

Closed twest820 closed 2 days ago

twest820 commented 3 months ago

writeVector()'s documentation for its arguments states

insert      logical. If TRUE, a new layer is inserted into the file, if the format allows it (e.g. GPKG allows that). See vector_layers to remove a layer
overwrite   logical. If TRUE, filename is overwritten

I have a terra 1.7-78 GeoPackage use case with insert for multi-layer file creation

writeVector(layer1data, "bunchOlayers.gpkg", layer = "layer1")
writeVector(layer2data, "bunchOlayers.gpkg", layer = "layer2", insert = TRUE)
[...]

For updating individual layers overwrite is needed

# updates layer1, does not change rest of bunchOlayers.gpkg
writeVector(layer1data, "bunchOlayers.gpkg", layer = "layer1", insert = TRUE, overwrite = TRUE)
# updates layer2, does not change rest of bunchOlayers.gpkg
writeVector(layer2data, "bunchOlayers.gpkg", layer = "layer2", insert = TRUE, overwrite = TRUE)
[...]

where overwrite behavior is consistent with GeoPackage layer updating in other GDAL based tools such as QGIS. But, as each individual layer update doesn't obliterate all the other layers in the package, is reasonably interpreted as inconsistent with terra's documentation. IMO this is a feature as it's pretty handy to be able to update one layer without having to buffer a GeoPackage's entire contents to memory and then insert all the other layers back as well. However, usual practice in defensive programming is to avoid undocumented behavior as it's not part of the API contract and thus subject to change.

An apparent corollary to this is specifying just overwrite = TRUE doesn't behave as documented.

writeVector(layer1data, "bunchOlayers.gpkg", layer = "layer1", overwrite = TRUE)
Error: [writeVector] Creation of output dataset failed
In addition: Warning message:
A file system object called 'bunchOlayers.gpkg' already exists. (GDAL error 1) 

I'm thus wondering if terra's documentation should be amended to something like

insert      logical. If TRUE, a new layer is inserted or an existing layer overwritten if the file format allows it (e.g. GPKG allows these), depending on the value of overwrite. See vector_layers to remove a layer
overwrite   logical. If TRUE and insert is FALSE, filename will be overwritten if the file format and layer structure permits. If TRUE and insert is TRUE, only the target layer is overwritten when the format allows (e.g. GPKG).

with then perhaps some remarks about behavior with GeoPackages and other common formats. This would make the ability to update individual layers in GeoPackages a (formally) supported behavior.

rhijmans commented 2 days ago

Thank you very much for the thorough analysis and clear suggestions, that is very helpful.