Merging multiple GeoPackage using native:mergevectorlayers and saving to a memory layer can cause FID collision later on when passing this layer to an other processing tool that then save to a new GeoPackage. This issue is that the fid column then may contain duplicate values and that this column is used by default by processing tools as the unique ID when saving to the GeoPackage format.
It must be noted that not all processing tools react the same way when saving to GeoPackage a memory layer with a non-unique fid column:
Merge vector layers: Create GPKG with the fid column being renumerated from 1
Reproject layer: Create GPKG, but discard entities with duplicate fid value. No warning message is emitted.
Buffer: Tool crash with error "Could not write feature into OUTPUT"
For all these cases, I think that the correct behavior should be to create a new GPKG with the fid column being renumerated from 1 and to emit a warning saying that IDs has been reset due to collisions. This would be only when there is actual fid collision detected. When there is no collision, the original IDs would be kept, as this is already the case.
import tempfile
tempdir = tempfile.gettempdir()
# Source layers (download link above)
layer1 = "C:/temp/lyr1.gpkg"
layer2 = "C:/temp/lyr2.gpkg"
# Display IDs of the source layers entities
# See the incoming collision with two 80
[(f.id(), f["fid"]) for f in QgsVectorLayer(layer1).getFeatures()] # IDs: [(3, 3), (80, 80)]
[(f.id(), f["fid"]) for f in QgsVectorLayer(layer2).getFeatures()] # IDs: [(4, 4), (80, 80)]
# Merging both source layers directly into a GeoPackage
# See that no entities are lost and that new IDs are generated
outpath_merge = os.path.join(tempdir, "lyr_merge.gpkg")
processing.run("native:mergevectorlayers", {
'LAYERS':[layer1, layer2],
'OUTPUT':outpath_merge
})
[(f.id(), f["fid"]) for f in QgsVectorLayer(outpath_merge).getFeatures()] # IDs: [(1, 1), (2, 2), (3, 3), (4, 4)]
# Merging both source layers into a memory layer
# See that no entities are lost and that new IDs are generated
lyr_merged = processing.run("native:mergevectorlayers", {
'LAYERS':[layer1, layer2],
'OUTPUT':'TEMPORARY_OUTPUT'
})["OUTPUT"]
[(f.id(), f["fid"]) for f in lyr_merged.getFeatures()] # IDs: [(1, 3), (2, 80), (3, 4), (4, 80)]
# Saving the merged memory layer into a GeoPackage using QgsVectorFileWriter
# One entity is lost, QgsVectorFileWriter returns code 7 (ErrFeatureWriteFailed)
# with an OGR error message about ID collision
outpath_fileWriter = os.path.join(tempdir, "lyr_fileWriter.gpkg")
save_options = QgsVectorFileWriter.SaveVectorOptions()
QgsVectorFileWriter.writeAsVectorFormatV3(lyr_merged, outpath_fileWriter, QgsCoordinateTransformContext(), save_options)
[(f.id(), f["fid"]) for f in QgsVectorLayer(outpath_fileWriter).getFeatures()] # IDs: [(3, 3), (4, 4), (80, 80)]
# Saving the merged memory layer into a GeoPackage but doing
# an other processing before (here the Reproject tool)
# One entity is lost but there is no message to the user about the ID collision
outpath_reproject = os.path.join(tempdir, "lyr_reproject.gpkg")
processing.run("native:reprojectlayer", {'INPUT':lyr_merged, 'TARGET_CRS':lyr_merged.crs(), 'OUTPUT':outpath_reproject})
[(f.id(), f["fid"]) for f in QgsVectorLayer(outpath_reproject).getFeatures()] # IDs: [(3, 3), (4, 4), (80, 80)]
# Saving the merged memory layer into a GeoPackage but doing
# an other processing before (here the Buffer tool)
# The processing crash with an error ("Could not write feature into OUTPUT")
outpath_buffer = os.path.join(tempdir, "lyr_buffer.gpkg")
processing.run("native:buffer", {'INPUT':lyr_merged,'DISTANCE':10,'OUTPUT':outpath_buffer})
Versions
QGIS 3.28.15 and 3.34.3
GDAL 3.8.3
PROJ 9.3.1
GEOS 3.12.1-CAPI-1.18.1
Windows 10 Entreprise 22H2
Supported QGIS version
[X] I'm running a supported QGIS version according to the roadmap.
What is the bug or the crash?
Merging multiple GeoPackage using
native:mergevectorlayers
and saving to a memory layer can cause FID collision later on when passing this layer to an other processing tool that then save to a new GeoPackage. This issue is that thefid
column then may contain duplicate values and that this column is used by default by processing tools as the unique ID when saving to the GeoPackage format.It must be noted that not all processing tools react the same way when saving to GeoPackage a memory layer with a non-unique
fid
column:fid
column being renumerated from 1For all these cases, I think that the correct behavior should be to create a new GPKG with the
fid
column being renumerated from 1 and to emit a warning saying that IDs has been reset due to collisions. This would be only when there is actual fid collision detected. When there is no collision, the original IDs would be kept, as this is already the case.Steps to reproduce the issue
Versions
QGIS 3.28.15 and 3.34.3 GDAL 3.8.3 PROJ 9.3.1 GEOS 3.12.1-CAPI-1.18.1
Windows 10 Entreprise 22H2
Supported QGIS version
New profile
Additional context
No response