Open ioquatix opened 4 years ago
execute_batch2
calls back in to Ruby for each row.
We would have to unlock the GVL before calling sqlite3_exec
, then re-acquire the lock each time we process a row. AFAIK releasing / acquiring the lock isn't free, so this could slow down overall throughput (especially if done for every row). On top of that, I don't think anyone actually uses this API, let alone in a multi-threaded application. Another challenge is that SQLite3 is only optionally thread safe, so we would have to add conditionals around whether or not the release the GVL (and those conditionals would have to exactly match SQLite3 which lets you pick thread safetyness at compile and/or runtime). Finally, if someone were actually using this method in a threaded environment, and we made a mistake with the GVL lock / unlock we could corrupt their data.
For the above reasons, (high risk, low reward) I'm not particularly motivated to fix this. So if someone wants to try tackling it, that would be great. Otherwise I'm going to wait for user demand.
I think what you proposed is the correct approach as locking the VM is definitely the wrong approach. The overhead of the per-row lock is to be worn by the user of the interface and is a natural consequence of the GVL.
Sqlite3 should not be used on multiple threads IMHO, it should be a separate connection/instance per thread. Does that work? Otherwise, a per-connection mutex would make sense to me. What happens if the row callback tries to execute another query?
I think what you proposed is the correct approach as locking the VM is definitely the wrong approach.
I don't understand? My proposal is to do nothing unless there is demand for the feature by people that use it.
I'm also open to deprecating and deleting the method too. But AFAIK it's not hurting any users the way it's implemented today
We would have to unlock the GVL before calling sqlite3_exec, then re-acquire the lock each time we process a row.
This implementation is the correct one. However, I agree it's inefficient.
If sqlite3 is not thread safe, you should protect it with a mutex, which would also prevent using the same connection for a 2nd query while enumerating the results of a different query.
Do any of the methods in this code base release the GVL?
Greetings! I'm working on an project at https://github.com/fly-apps/railsite that would make Sqlite a real option for smaller production Rails deployments, which would ideally work with a threaded Puma server.
Since it's been a while, here's what the results look like in ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
with sqlite3 1.5.4
.
.........>.<.........
This confirms this is still an issue with the latest versions of Ruby and the Gem.
My question is whether or not the flag at https://github.com/sparklemotion/sqlite3-ruby/blob/beaa1425383970d4fd3abce8a9798a6a4e615f5a/lib/sqlite3.rb#L13-L14 would solve the thread safety problems discussed above. If so, then this may be a documentation issue? I'm guessing there's more work that needs to be done around implementing a Mutex in Rubyland.
AFAICT this is not only an issue with execute_batch
but any execution using this gem. This effectively prevents any kind of Thread / Fiber based concurrency from working as expected.
Now that "SQLite in production" is trending, this is going to pose obvious performance issues for anyone using thread / fiber based servers / job processors like puma / falcon / sidekiq etc for their apps.
The extralite gem is an alternative SQLite client which releases the GVL during blocking, see note on concurrency here: https://github.com/digital-fabric/extralite?tab=readme-ov-file#concurrency. It is both significantly faster than this gem in general and doesn't have concurrency issues. Maybe that could be a source of inspiration for how to solve this? (I'd love to help out myself, but alas I have no experience programming in C)
I'm currently messing with a POC for allowing extralite
to be used with ActiveRecord's SQLite adapter: https://github.com/fractaledmind/activerecord-enhancedsqlite3-adapter/pull/2. The implementation hacky right now, but it works and solves the concurrency issues. Would be nice if this gem could solve this as well, to avoid the rigamarole of migrating the backing client for ActiveRecord.
It is both significantly faster than this gem in general and doesn't have concurrency issues. Maybe that could be a source of inspiration for how to solve this? (I'd love to help out myself, but alas I have no experience programming in C)
I took a quick look and unfortunately, no, that does not look like an implementation to look up to. It seems like by default they release the GVL for 1 in 1000 rows returned which is essentially just an overly complex way to call Thread.pass
while actually holding the GVL for 99.9+% of work.
Gives
Expected:
Or something to that effect.
cc @ko1 @tenderlove @jeremyevans