zombiezen / go-sqlite

Low-level Go interface to SQLite 3
https://pkg.go.dev/zombiezen.com/go/sqlite
ISC License
699 stars 16 forks source link

BlockOnBusy backoff period is long #75

Closed wdullaer closed 4 months ago

wdullaer commented 4 months ago

I'm in the process of porting a codebase to use this library for its sqlite interactions.

We have a very highly concurrent workload and some tests that assert that the application doesn't start returning busy errors under load.

When I ported these tests over, they started taking over 3 mins to complete, while in the current implementation they take 1s at most.

I managed to trace down the difference to the backoff times used in the BlockOnBusy handler: https://github.com/zombiezen/go-sqlite/blob/5c555a3aec7ab6cb427d275fbd6fe18494a74c43/sqlite.go#L338-L351 The shortest wait period is 1s, and it increases very rapidly from there.

If we compare this with the backoff used in the default BusyTimeout handler: https://gitlab.com/cznic/sqlite/-/blob/master/lib/sqlite_darwin_amd64.go?ref_type=heads#L121226 This one starts 1ms and waits up to 100ms at most.

I think there's too much difference between these two: a backoff of a few ms will suffice for most well behaved applications to resolve the lock. In any case, it makes the current default behaviour of this library unusable in my project, it will tank throughput. I can revert to using a busy timeout, but I like the BlockOnBusy design: it removes a hard to guess tunable from the equation.

If you are not willing to change these backof timeouts, would you consider making them tunable with a config option?

zombiezen commented 4 months ago

Seems reasonable. I think I copied these values from somewhere in the SQLite source without much thought.

ncruces commented 4 months ago

Just FYI.

The SQLite implementation does use those numbers, but they're milliseconds, not seconds: https://github.com/sqlite/sqlite/blob/3a32690a5519f7927947076bf17f64f4243c4396/src/main.c#L1713-L1753

sqlite3OsSleep takes microseconds, so delay in sqlite3OsSleep(db->pVfs, delay*1000) is in milliseconds; just below, sqlite3OsSleep(db->pVfs, 1000000) is sleeping a hole second.

So SQLite (by default, in all modern systems) never sleeps in increments of more than 100ms. Even when it's sleeping for 60s, it'll poll the lock every 100ms.

zombiezen commented 4 months ago

Whoops, then that was definitely an error on my part. Thanks all!

wdullaer commented 4 months ago

Thanks for the quick turnaround! I would have just stuck with the delay list of sqlite myself, there's something to be said for keeping this kind of behaviour equivalent, but it should be really hard to reach the 100ms+ delay in practice. Again, thanks for the swift reply and all the work on the project. It's been a joy to use.

zombiezen commented 4 months ago

I would have just stuck with the delay list of sqlite myself, there's something to be said for keeping this kind of behaviour equivalent, but it should be really hard to reach the 100ms+ delay in practice.

You and @ncruces make a fair point. I've removed the values above 100ms. That will be incorporated in the next release.

Again, thanks for the swift reply and all the work on the project. It's been a joy to use.

You're very welcome! I am glad that is your experience. 😊