sparklemotion / sqlite3-ruby

Ruby bindings for the SQLite3 embedded database
https://github.com/sparklemotion/sqlite3-ruby/
BSD 3-Clause "New" or "Revised" License
761 stars 200 forks source link

Use `sqlite3_close` instead of `sqlite3_close_v2`? #564

Closed tenderlove closed 1 hour ago

tenderlove commented 20 hours ago

I didn't fully realize the consequences of switching to sqlite3_close_v2 when I reviewed #558. Since sqlite3_close_v2 doesn't return a status code, it means that we can have SQLite3::Statement objects that point at a database which has been closed by someone.

Since the database could have been closed, we have to check on every call to step whether or not the db is "ok", and unfortunately the step method is extremely hot. It's called for every row of the statement.

For example, this benchmark:

require 'sqlite3'
require 'benchmark/ips'

file = "/tmp/test.sqlite3"

File.unlink file rescue nil

db = SQLite3::Database.new file
db.execute "create table foo (id int)"

1000.times do |i|
  db.execute "insert into foo (id) values (#{i})"
end

Benchmark.ips do |x|
  x.report("selecting") {
    db.execute("select * from foo").to_a
  }
end

db.close

At commit 480f3e7d, the results are like this:

$ ruby -I lib test.rb
ruby 3.3.4 (2024-07-16 revision 425e468d25) [arm64-darwin23]
Warming up --------------------------------------
           selecting   855.000 i/100ms
Calculating -------------------------------------
           selecting      8.495k (± 0.4%) i/s -     42.750k in   5.032394s

At commit 81ea485f680c54f604ec1d0f40a00d999a149dc0, the results are like this:

$ ruby -I lib test.rb
ruby 3.3.4 (2024-07-16 revision 425e468d25) [arm64-darwin23]
Warming up --------------------------------------
           selecting   582.000 i/100ms
Calculating -------------------------------------
           selecting      5.810k (± 0.5%) i/s -     29.100k in   5.008952s

This simple benchmark is substantially slower after switching to sqlite3_close_v2.

Before da908657099a20f2796e49e6ed42c102a9aa7a80, the following code would raise an exception:

require 'sqlite3'
require 'benchmark/ips'

file = "/tmp/test.sqlite3"

File.unlink file rescue nil

db = SQLite3::Database.new file
db.execute "create table foo (id int)"

1000.times do |i|
  db.execute "insert into foo (id) values (#{i})"
end

stmt = db.prepare "select * from foo"
stmt.step

db.close # used to raise an exception

This meant that statements couldn't have "dangling references" to a database connection. Since this doesn't raise an exception anymore, it is possible and we must therefore check the validity of the db connection before fetching the next row.

Would it be possible to switch back to using sqlite3_close and raising an exception?

I know the SQLite documentation says we should call sqlite3_close_v2 in GC languages, but I think it only makes sense to call that function from the free function called by the GC. If a user calls close, we should use sqlite3_close and raise an exception.

Statements keep a reference to the database connection, so we shouldn't have to fear the db connection being closed via GC until the user is done using the statement. I've read #558 a few times, and it's not clear to me why sqlite3_close_v2 is necessary for fork safety, but I could definitely be missing something.

Thanks.

tenderlove commented 19 hours ago

I've read #558 a few times, and it's not clear to me why sqlite3_close_v2 is necessary for fork safety, but I could definitely be missing something.

I get it now. When we fork, we need to make the database and any associated statements unusable. We need a way of marking the database as "unusable" whether that be via directly asking the db object for something, or indirectly asking a statement to do something (even if that statement is just calling step).

I wonder if there's a way to at least make the check faster.

flavorjones commented 19 hours ago

Yes. I will make this better.

flavorjones commented 15 hours ago

See https://github.com/sparklemotion/sqlite3-ruby/pull/565

flavorjones commented 1 hour ago

For posterity: #558 did not introduce sqlite3_close_v2, it was introduced by #557 and is orthogonal to fork-safety.

The problem being described in this ticket is a valid performance regression but is not related to which close method is being used. #565 addresses the performance issue by avoiding instance variable access in a hot C method.