Closed MLpranav closed 2 weeks ago
sqlite3 on macOS is taking 1000-35000x more time to create a new table when the table name begins with a number.
The sqlite3
extension module does not create the table; it only passes the query as it is on to the SQLite library, which then executes the query. As such, I don't see how this could be a problem with the sqlite3
extension module at all.
Since the sqlite3
extension module does not interfere with the query that is being passed to the SQLite library, there is nothing we can do to mitigate your problem. Suggesting to close this as not-a-bug.
It's possible that this is a bug in how we built sqlite3
for macOS, but we'll need rather more information (how Python was installed, how libsqlite3
was sourced, exact versions of each, etc.) before that could be confirmed, much less fixed.
I think that's because sqlite3 needs to check if the number preceding the text is part of a numerical expression, a reserved keyword, or an error and decide what is true, and to do that it needs to perform an exhausting verification which consumes more resources from the machine, that's why in various languages you can't declare a variable name beggining with a number (the majority of them will just prompt an error, python by example, returns a SyntaxError exception) and given that your using a MacOs, the operational system might be more strict about allowing resources to processes or might have a different escalation system so it just adds with the syntax check. In general, i strongly recommend you using number after letters, not only because of syntax check, but also, for readbility. If you really need using numbers before letters in this case, then i'm sorry, i don't know what to recommend, maybe reinstalling sqlite3, or formatting the whole OS, but i don't think these will solve the issue.
The
sqlite3
extension module does not create the table; it only passes the query as it is on to the SQLite library
This delay is not present in the official sqlite shell. So it is highly probable that the issue is on Python's end.
Some preprocessing is being done in the extension module before sending the queries to the official sqlite
library, this might be the culprit.
I am trying to build Python with a modified pysqlite_cursor_execute_impl
function located in cpython/Modules/_sqlite/cursor.c
to print timestamps every few lines so that I can isolate the exact part of code causing the delay.
In general, i strongly recommend you using number after letters, not only because of syntax check, but also, for readbility.
I am using YYYYMMDD table names to store time series data. Using a letter before the numbers here might not be ideal.
Thanks for the responses. I hope that we can get to the bottom of this soon.
The
sqlite3
extension module does not create the table; it only passes the query as it is on to the SQLite libraryThis delay is not present in the official sqlite shell. So it is highly probable that the issue is on Python's end.
How did you install Python (homebrew, python.org installer, Apple's devtools, ...)? Did you check that the official sqlite3 shell is linked to the same sqlite library as the _sqlite3 extension module?
Anyway, the code below ends up using a very small amount of time for the first table (0.000085 seconds) and a significant amount of time for the second table (1.397151 seconds) when using Python 3.12 installed using the python.org installer.
import sqlite3, datetime, time, contextlib
@contextlib.contextmanager
def timed():
before = time.time()
yield
after = time.time()
print("took %fs"%(after-before))
sqlite3.enable_callback_tracebacks(True)
conn = sqlite3.connect('test.db')
for i in range(200):
print(f"TAB {i}")
conn.execute(f'CREATE TABLE tab{i} ("value" INT)')
for j in range(100_000):
conn.execute(f"INSERT INTO tab{i} (value) VALUES ({j})")
conn.set_trace_callback(print)
conn.execute("PRAGMA journal_mode=DELETE")
with timed():
conn.execute('CREATE TABLE "t09" ("time" TEXT);')
with timed():
conn.execute('CREATE TABLE "09" ("time" TEXT);')
I am using Python 3.11.6 installed via homebrew.
Did you check that the official sqlite3 shell is linked to the same sqlite library as the _sqlite3 extension module?
Most probably they both include their source for the same library and the exact version may not be the same. Will look into this.
Thanks for the repro, Ronald. Some observations:
The delay is only observed iff the complete test database is created from scratch during the same session as the "offending queries". That is, if you pre-create the database and only perform the "offending queries" you will not be able to observe the slowdown.
The delay is tied to the query itself; if you swap the "offending queries", the delay is now present on the first query, not the latter.
You can use a :memory:
database and 10 tables with approx. 10k rows in order to get a faster but still deterministic repro.
The delay is observable on all Python versions I have installed, from Python 3.8 through 3.13.
Some preprocessing is being done in the extension module before sending the queries to the official sqlite library, this might be the culprit.
No preprocessing is done; we only detect the type of query. The code is here:
FTR, I'm trying to reproduce using the SQLite CLI. It takes some time though, because as I already know, using a pre-created database will not work; you'll have to first create an SQL script that creates the database, then .read
that into the SQLite shell, and then execute the "offending queries", all in the same CLI session.
I get this:
% time /path/to/bin/sqlite3 test.db 'CREATE TABLE "011" ("time" TEXT);'
/path/to/bin/sqlite3 test.db 0.89s user 0.07s system 99% cpu 0.970 total
% time /path/to/cpython-dev/build_deps/bin/sqlite3 test.db 'CREATE TABLE "t011" ("time" TEXT);'
/path/to/build_deps/bin/sqlite3 test.db 0.00s user 0.00s system 68% cpu 0.010 total
This is with the "test.db" created by my reproducer, using sqlite 3.44.2 build in a similar way as used by the CPython installer. /usr/bin/sqlite3
is fast for both variants (3.43.2 as shipped on macOS 14.4.1).
A possibly relevant news entry for SQLite 3.44.0 (item 6 in their list:
Many errors associated with CREATE TABLE are now raised when the CREATE TABLE statement itself is run, rather than being deferred until the first time the table is actually used.
This is with the "test.db" created by my reproducer, using sqlite 3.44.2 build in a similar way as used by the CPython installer.
/usr/bin/sqlite3
is fast for both variants (3.43.2 as shipped on macOS 14.4.1).
Yeah, but see https://github.com/python/cpython/issues/118448#issuecomment-2088188643:
[...] if you pre-create the database and only perform the "offending queries" you will not be able to observe the slowdown.
[...] if you pre-create the database and only perform the "offending queries" you will not be able to observe the slowdown.
I have noticed the same using the repro code given above.
But in my initial scenario, connecting to an existing database also has the same delay.
Maybe the test database has to be large enough for the slowdown to occur in a separate session.
A possibly relevant news entry for SQLite 3.44.0 (item 6 in their list:
Many errors associated with CREATE TABLE are now raised when the CREATE TABLE statement itself is run, rather than being deferred until the first time the table is actually used.
I tried with both 3.44.0 and 3.45.3, but it has no effect; I still observe the slowdown (if the database is created in the same session).
But in my initial scenario, connecting to an existing database also has the same delay.
Maybe the test database has to be large enough for the slowdown to occur in a separate session.
Please try to reproduce this so we're 100% sure about this observation.
Please try to reproduce this so we're 100% sure about this observation.
No need, I reproduced it myself.
sqlite3
-> v3.43.2 preinstalled on macOS 14.4.1.
sqlite_new
-> alias for v3.45.3 installed from homebrew.
time sqlite3 test.db 'CREATE TABLE "111" ("time" TEXT);'
time sqlite3 test.db 'CREATE TABLE "a111" ("time" TEXT);'
time sqlite_new test.db 'CREATE TABLE "211" ("time" TEXT);'
time sqlite_new test.db 'CREATE TABLE "a211" ("time" TEXT);'
sqlite3 test.db 'CREATE TABLE "111" ("time" TEXT);' 0.00s user 0.01s system 48% cpu 0.029 total
sqlite3 test.db 'CREATE TABLE "a111" ("time" TEXT);' 0.00s user 0.00s system 60% cpu 0.009 total
/opt/homebrew/opt/sqlite/bin/sqlite3 test.db 0.42s user 0.43s system 45% cpu 1.874 total
/opt/homebrew/opt/sqlite/bin/sqlite3 test.db 0.00s user 0.00s system 69% cpu 0.007 total
Seems like that it is confirmed that the problem lies with sqlite3
itself and not Python.
Should I close this issue and raise it with the sqlite3
team?
time sqlite3 testbig.db 'CREATE TABLE "211" ("time" TEXT);'
time sqlite_new testbig.db 'CREATE TABLE "212" ("time" TEXT);'
sqlite3 testbig.db 'CREATE TABLE "211" ("time" TEXT);' 0.02s user 0.01s system 54% cpu 0.049 total
/opt/homebrew/opt/sqlite/bin/sqlite3 testbig.db 15.41s user 13.68s system 44% cpu 1:05.64 total
The delay is a lot more pronounced on a large, ~15 GB database.
The slowdown can be pinpointed to the sqlite3_step
C API call. This implies:
sqlite3
extension module creates the statement ("preprocessing", etc.)Seems like that it is confirmed that the problem lies with sqlite3 itself and not Python.
I assume you mean SQLite, and not sqlite3
.
Should I close this issue and raise it with the sqlite3 team?
Sounds good to me. Note that they require accurate, deterministic reproducers for any kind of bug report. Good luck!
Just FYI: the cause of the problem has been found upstream and a fix is under investigation, In short: on 2023-10-13, internals were added which run pragma integrity_check($TABLENAME)
on each new table. When $TABLENAME
looks like a number, it is interpreted as such and causes integrity_check()
to check the whole db rather than just the new table. Thus, the larger the db, the more prominent the delay is. The forum post:
Thanks a lot for the heads-up, Stephan!
Bug report
Bug description:
Output when test.db is ~15 GB with ~3500 tables with 100k rows each:
Output when test.db is ~1 GB with ~1000 tables with 100k rows each:
sqlite3
on macOS is taking 1000-35000x more time to create a new table when the table name begins with a number.This issue is not present on Windows using the exact same code and versions.
This issue is not present in the sqlite CLI on the same macOS machine, so it is a
sqlite3
module issue.The delay is a lot more prominent on large databases, but it exists on small databases as well.
CPython versions tested on:
3.11
Operating systems tested on:
macOS, Windows