jeremyevans / sequel

Sequel: The Database Toolkit for Ruby
http://sequel.jeremyevans.net
Other
5k stars 1.07k forks source link

Improve performance of regexp matching in the sqlite adapter #2108

Closed paddor closed 11 months ago

paddor commented 11 months ago

Nothing much, just what the title says. Theoretically, if an infinite amount of different regexp are used in SQL queries, it could of course fill up RAM, but I think that's uncommon.

In my case, this brought the runtime from ~300ms down to ~70ms for a query that uses one single regexp in a table with around ~5700 rows.

paddor commented 11 months ago

I've changed the implementation. It now only caches Regexp objects if setup_regexp_function => :cached was given.

paddor commented 11 months ago

Thinking about it a little longer, maybe the option setup_regexp_function should just support taking a block. That way, the user is able to use modern features like Regexp#match? and Regexp.new(x, timeout: y) or just periodically clear the cache hash to avoid memleaks.

paddor commented 11 months ago

This is how I plan to use it now:

regexp_cache = Hash.new { |h, k| h[k] = Regexp.new(k, timeout: 0.005) }
opts = {
  setup_regexp_function: proc { |re_str, str| regexp_cache[re_str].match? str }
}
Sequel.connect('sqlite:///path/db.sqlite3', opts)

Let me know what you think. I left the :cached functionality in, with appropriate changes to the documentation. Let me know if I should remove it.

jeremyevans commented 11 months ago

Squash merged at c8248241ddf1a8d5d06f296aab5dca40b777c7b8

paddor commented 11 months ago

Thanks!