marcoarment / FCModel

An alternative to Core Data for people who like having direct SQL access.
MIT License
1.65k stars 173 forks source link

Proposal: remove AUTOINCREMENT support #33

Closed marcoarment closed 10 years ago

marcoarment commented 10 years ago

The concept of a model instance with a not-yet-set primary key is weird, and requires a lot of hacky special handling. Right now, to handle AUTOINCREMENT, models may have unset primary keys from new until a successful save. In your app's various functions, there's no guarantee that any model instance actually has a primary key at the time you're operating on it (unless you check every time, which is tedious and error-prone).

SQLite's AUTOINCREMENT documentation shows that its automatic rowid pseudo-column is already fulfilling many of the same duties for reads, which is available in any query's ORDER BY clause in FCModel already (and I could make it a property without a ton of effort).

I've also found that I've never actually used AUTOINCREMENT. It's not well-suited to a world with sync and concurrency. In every case so far, random 64-bit integers or GUID strings have been the better choice. It's better to remove a bad option than to let people shoot themselves in the foot with it.

I'd like to remove AUTOINCREMENT support and replace it with optional randomly generated 64-bit signed-integer primary keys. (FCModel subclasses could do their own thing, like GUID strings, if they wanted different key-generation behavior.) This would clean up the FCModel code a bit and prevent some of those weird potential bugs in usage.

Models would no longer be permitted to instantiate without any primary key value — you'd still use instanceWithPrimaryKey: or instanceWithPrimaryKey:createIfNonexistent: normally, but if you simply called new, a random key value would be assigned that's unique among all existing values in the table and all unsaved values currently in memory. Modifying any model's primary-key value after instantiation would raise an exception.

What do you think?

jasonsilberman commented 10 years ago

+1. I agree, I have never used AUTOINCREMENT in real world use.

ahti commented 10 years ago

:thumbsup:

FCModel subclasses could do their own thing, like GUID strings, if they wanted different key-generation behavior.

This would be pretty nice to have, I could imagine something like a class method +primaryKeyForNewInstance that subclasses override to accomplish this. It would certainly make my code cleaner in some places (using UUIDs).

brentsimmons commented 10 years ago

Though I'm using my own thing instead, I agree with the design. In my system ids are 64-bit integers or GUID strings, precisely for the same reason (sync).

beccadax commented 10 years ago

My first instinct was to say this was completely insane, but if FCModel is willing to guarantee the keys won't collide, I guess that'd be fine.

collindonnell commented 10 years ago

Makes sense to me. If it makes the code cleaner, it seems like a win.

klaas commented 10 years ago

Marco, whatever you think is best!

For my usage 64-bit random ids are fine.

ahmadnassri commented 10 years ago

never use AUTOINCREMENT however agree on principal.

ashleymills commented 10 years ago

Your correct Marco. Trying to coordinate a sync between multiple devices using autoincrementing keys is a nightmare. A guaranteed unique I'd is the way forward.

NZKoz commented 10 years ago

Removing the functionality will leave people whose applications are already in the wild with a hairy migration to take care of, so I'd suggest doing it carefully or warning people not to upgrade.

Also I see people referring to randomly assigned 64bit ids as if they're unique, to quote from someone smarter than me:

The "birthday paradox" states that in order to avoid collisions you need to select random values from twice the bit-size of the number of values you will be selecting.

So unless your global ID space would fit in 32 bits, or you're saying 'unique' when you mean 'unique per user', then you'll need bigger ids than 64 bit ints if you're selecting them randomly.

mlilback commented 10 years ago

Autoincrement is problematic in FCModel because you can't make use of it inside the schemaBuilder: block of openDatabaseAtPath:. I'm all for random ids as long as FCModel worries about conflicts. It would be worth my time to strip out use of autoincrement.

peternlewis commented 10 years ago

@NZKoz you're right to worry about the birthday paradox (which basically says that the chance of a collision goes up as the square of the number of entries). So you need to think about the maximum potential total number of colidable objects and then square it (ie, double the number of bits). A 64 bit number would work well for uniqueness among a user created set of items. A user is never going to create 2^32 objects, even though potentially you could have 2^32 total objects among all users (on a sync server, the real primary key would be a combination of the user ID and the 64 bit number). A user's tweet, for example, could be a 64 bit ID, but the globally visible key would be userid+64bitnumber.

If you really need a globally visible unique ID (eg, you want to create objects which could be "owned" by more than one user), then you might want a larger primary key than 64 bits, probably a UUID (which on the Mac is generally a version 4 UUID, which is a 122 bit random number). http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_.28random.29

If you're expecting more than 2^64 objects, then I think you should probably readjust your goals - if every WhatsApp user created a million messages a year, you'd be ok for about a hundred years…

marcoarment commented 10 years ago

@NZKoz @peternlewis I also wouldn't just blindly generate an int64_t and go with it — I'd lock the database and check it and the live-instances cache for an existing entry, and generate new random IDs until I found one that didn't conflict. This is likely to generate and check only one ID the vast majority of the time, and for any conflicts, it would only need to generate one additional ID for the vast majority of those times. (I'd actually do this for GUIDs, too, since it's so easy and relatively cost-free.)

@NZKoz Your point about applications in the wild is definitely not something I'm taking lightly — that's why I started and publicized this discussion. So far, I'm not aware of anyone who has shipped an app with FCModel using AUTOINCREMENT, and part of the reason I made this thread is to see if I can find any and see if there's a workable migration.

marcoarment commented 10 years ago

@peternlewis Relevant to the discussion, from SQLite Limits:

Maximum Number Of Rows In A Table
The theoretical maximum number of rows in a table is 264 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 140 terabytes will be reached first. A 140 terabytes database can hold no more than approximately 1e+13 rows, and then only if there are no indices and if each row contains very little data.

The 264 limit is presumably because the internal rowid on tables is a 64-bit integer. (Although I thought all of their 64-bit integers were signed, making 263 the real limit, assuming they don't use negative rowids.) Anyway, the fact that SQLite can only really hold a full table of ~43-bit IDs, limited long before that by likely storage limits, should give some idea of how much headroom there is for most tables to just use the positive side of random 64-bit signed integers after checking for uniqueness, and for special cases to selectively use GUIDs instead, for their primary keys.

peternlewis commented 10 years ago

@marcoarment checking for conflicts when generating a UUID would only help if you have access to all UUIDs, which for something that is intended to be globally unique would not be the case (at least not on the client).

marcoarment commented 10 years ago

@peternlewis Sure, but that's mostly an implementation detail. If I'm asking the subclass for a custom response to e.g. newPrimaryKeyValue, I'll probably be checking all responses to that for uniqueness against the table and unsaved in-memory instances, regardless of whether it's FCModel's default implementation (random int64_ts) or a subclass's override (which might be a GUID or anything else).

johncblandii commented 10 years ago

We are generating our own PK values and our data is 100% from the server [no creation in the app] so as long as the changes don't interfere w/ my custom PK – sounds like it wouldn't – we'll be good to go.

For those who launched a public app with an alpha library: bold

jbmorley commented 10 years ago

I'm currently using AUTOINCREMENT in a couple of – as yet unshipped – applications. I think I'd agree with the general feeling here that so long as there's a mechanism for generating unique keys it doesn't really present a problem. Certainly having that key available prior to calling save would make some of the code to set up relationships significantly easier.

claaslange commented 10 years ago

:+1:

marcoarment commented 10 years ago

@jbmorley Assuming a random-integer key was assigned in new, would the change impact you at all? Is there any point in the code that depends on the key values being sequential and chronological?

jbmorley commented 10 years ago

@marcoarment That would work perfectly as I'm not relying on sequential keys for any functionality. I suspect it would be an all-but seamless transition.

On 12 Mar 2014, at 17:56, marcoarment notifications@github.com wrote:

@jbmorley Assuming a random-integer key was assigned in new, would the change impact you at all? Is there any point in the code that depends on the key values being sequential and chronological?

— Reply to this email directly or view it on GitHub.

marcoarment commented 10 years ago

I've just committed 9cac3d9 to the remove-autoincrement branch with the proposed changes. (I think this is all that's necessary, actually.)

I also took this opportunity to replace FMDatabaseQueue with a new FCModelDatabaseQueue with a few differences:

Take a look, try your code, etc. Thanks.

beccadax commented 10 years ago

Since this change will break old schemas, it might be nice if FCModel checked the schema after building it and threw an exception if the primary key field was AUTOINCREMENT (maybe only on debug builds?). That might be too much of a hassle, though.

marcoarment commented 10 years ago

@brentdax Is there a good way to tell? This method seems too hacky to be worthwhile, especially since it's only accurate if the table has ever had any rows.

This change also doesn't break all AUTOINCREMENT schemas. It still generates IDs automatically — they're just not going to be consecutive or chronologically ascending anymore. I've left AUTOINCREMENT in the test project to demonstrate that it still "works".

(But concerns like this are why I'm doing this change much more carefully than previous ones.)

marcoarment commented 10 years ago

I could also ship an AUTOINCREMENT-migration-helper function that you could call from primaryKeyValueForNewInstance to just always return sequential numbers greater than the largest known one for that table, with the caveat that every number may not be saved to the table (there would be gaps if you create +new instances but don't save them), and the save order may not be exactly chronological (if you create +new A then +new B, but save B before A, the ID-to-INSERT-order chronology will be wrong).

But I suspect this will benefit almost nobody. Let me know if I'm wrong.

beccadax commented 10 years ago

@marcoarment I would like some kind of warning or error because, even if AUTOINCREMENT is harmless since FCModel will generate IDs anyway, developers who use it apparently don't understand how FCModel assigns primary keys. If they expect to be able to sort by the primary key, they should probably be disabused of that notion quickly.

You could scan the sql column of the sqlite_master table for use of the AUTOINCREMENT keyword. I've confirmed that ALTER TABLE updates this column, so it should always reflect the current schema.

Alternatively, it appears that the sqlite_sequence table is created the first time you add an AUTOINCREMENT column, though it doesn't have entries for never-inserted tables. That could be used for a warning, although it's probably not reliable enough for a hard error, since it would still exist even if you wrote a migration that removed all AUTOINCREMENT columns (and it can't be dropped).

marcoarment commented 10 years ago

@brentdax Check out new commit 7b213f6 on the branch. Detection by dumb string matching on the schema SQL, logged warnings, and emulation.

timbroder commented 10 years ago

This makes sense to me and won't effect the App I'm using FCModel with.

marcoarment commented 10 years ago

Thanks, everyone. I think the current version of the branch, with warnings on launch and AUTOINCREMENT emulation for legacy tables, is safe enough to merge into master.