Do not encode "new" status of features into their FID

strk commented 8 years ago

Right now code expects to be able to tell if a feature is NEW or not by only looking at its FID. The current FID is simply a 64bit integer, which means there's no way to directly map an integer database field into a FID unless it can be ensured that no field value ever gets the value (or special range of values) used to identify a NEW feature.

This ticket is to rely on teh whole feature to tell if it's new or not, rather than on its FID, which could then be more capable.

Beside allowing direct mapping to signed integer identifiers in databases, moving the "NEW" flag from FID to feature would also allow identifying more than 4 billion features (ok, probably not really needed).

If such direct mapping is not allowed (as is the case up to 2.14) the database providers can only maintain their own mapping between a syntetic feature identifier and the database field value(s) -- which is needed anyway for multi-field keys, btw.

See http://hub.qgis.org/issues/14262 for real world case.

m-kuhn commented 8 years ago

How would fids for new features be chosen/generated instead?

strk commented 8 years ago

What are FIDs needed for, when the feature only exists in QGIS memory ? Could fid-using code reference features directly by pointer, instead ? (I hadn't digged enough in code to answer this question)

m-kuhn commented 8 years ago

What are FIDs needed for, when the feature only exists in QGIS memory ?

To have (a potentially big list of) references to features in a generic way, committed or not. See QgsFeatureRequest::FilterFid and QgsFeatureRequest::FilterFids.

Could fid-using code reference features directly by pointer, instead ?

Not all features exist in memory. So the fid would need to be replaced with a datastructure which has an extra flag to define if the id is to be treated as provider fid or as pointer. I doubt that it's feasible to rewrite all code to this new datastructure.

strk commented 8 years ago

Maybe there's an option not to rewrite any calling code. This is C++, after all, so a new datatype could really look like an integer, just exposing an additional "isNew()" method for those who care...

m-kuhn commented 8 years ago

so a new datatype could really look like an integer

And I think we are back to the question like which integer a not-yet-committed feature id should look to be sure it's unique.

strk commented 8 years ago

Why are we back to "an integer" ? Do we have to pass integers around somewhere or aren't we always passing a FID type ? If we pass the type we can then use any integer as long as it is unique across the new features (isn't this already solved with the current mechanism of using negative integers ? or how are those negative integer values decided ?)

m-kuhn commented 8 years ago

Why are we back to "an integer" ?

Because you literally used "an integer" yourself in the quote above 😉

I guess if it should "look like an integer" it should also look like a unique integer across all features.

Do we have to pass integers around somewhere or aren't we always passing a FID type ?

Not sure what relies on it but at least the python bindings will pass around an integer so far.

Basically we need to either use 1 bit for us and leave only 63 to the provider or leave 64 to the provider and get the extra bit somewhere else.

As far as I can see:

The upside of using a data structure is that

we can use 64 bit provider side keys without modification

The downside of a more complicated structure is

some side-effects which we don't know in detail (in our code or 3rd party code)
some extra overhead (no simple int comparison)
increased memory usage

strk commented 8 years ago

Looking like a unique index would be taken care of by comparison operators, so that (for example, pseudo-code):

 (new:1,id:3) != (new:0,id:3)

Not sure what relies on it but at least the python bindings will pass around an integer so far.

Ok this would be a problem as it means it's part of the API

Basically we need to either use 1 bit for us and leave only 63 to the provider or leave 64 to the provider and get the extra bit somewhere else.

Right.

As far as I can see:

The upside of using a data structure is that

we can use 64 bit provider side keys without modification

The downside of a more complicated structure is

some side-effects which we don't know in detail (in our code or 3rd party code)

some extra overhead (no simple int comparison)

increased memory usage

"increaded memory usage" and "overhead" would not be true when the alternative for a 64bit integer would be to use a Map structure. At least, this is the concern you raised about always using a FidMap...

The unknown side-effects (API change) is a real concern.

m-kuhn commented 8 years ago

"increaded memory usage" and "overhead" would not be true when the alternative for a 64bit integer would be to use a Map structure. At least, this is the concern you raised about always using a FidMap...

In both cases I have been referring to the estimated 98% of the cases where people are using a positive integer smaller than 9223372036854775807 on postgres and all the other providers which are also affected by a change like this.

strk commented 8 years ago

On Mon, May 23, 2016 at 03:11:16AM -0700, Matthias Kuhn wrote:

"increaded memory usage" and "overhead" would not be true when the alternative for a 64bit integer would be to use a Map structure. At least, this is the concern you raised about always using a FidMap...

In both cases I have been referring to the estimated 98% of the cases where people are using a positive integer smaller than 9223372036854775807.

It's not about the actually used variable, but the datatype capacity. Granted that a "serial" type becomes a 32bit one, so your 98% statement may not be too off.

qgis / qgis4.0_api

Do not encode "new" status of features into their FID #40