Closed abdelouahabb closed 10 months ago
APSW doesn't change SQLite's behaviour when it comes to executing SQL.
TypeError: filename should be a string
That is the error you are getting. You should print out what the value of filename actually is, as well as the flags. augmented tracebacks will also contain the same information. My guess is SQLite is spilling from memory to disk.
If you want further help you'll need to include the APSW version, as well as enough for me to reproduce the problem.
APSW doesn't change SQLite's behaviour when it comes to executing SQL.
TypeError: filename should be a string
That is the error you are getting. You should print out what the value of filename actually is, as well as the flags. augmented tracebacks will also contain the same information. My guess is SQLite is spilling from memory to disk.
If you want further help you'll need to include the APSW version, as well as enough for me to reproduce the problem.
Sorry, here is what I got :
File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 1008, in get
cur.execute('''
contenu = 'Année Passée'
exc = TypeError('filename should be a string')
fin = 500
nombre = (8099,)
now = datetime.datetime(2024, 1, 5, 18, 16, 43, 611, tzinfo=zoneinfo.ZoneInfo(key='Africa/Algiers'))
pagination = 16
self = <__main__.Recette object at 0x000001EB7FED2390>
start = 6000
url = '/admin/recette?q=l'
what = 'l&skip'
File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 57, in xOpen
return ObfuscatedVFSFile(self.base_vfs, name, flags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
flags = [1054, 0]
name = None
self = <__main__.ObfuscatedVFS object at 0x000001EB7F8932A0>
File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 63, in __init__
apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
filename = None
flags = [1054, 0]
inheritfromvfsname = ''
self = <__main__.ObfuscatedVFSFile object at 0x000001EB104A7F70>
TypeError: filename should be a string
Windows 10 64bits,
APSW 3.44.2.0,
Python Python 3.11.7
Seems it loses the filename argument after a certain cursor limit ?
This is a bug in APSW.
The open flags are SQLITE_OPEN_CREATE | SQLITE_OPEN_DELETEONCLOSE | SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_TRANSIENT_DB
The SQLite doc does say NULL is allowed. I vaguely recall in the past that a zero length string was passed.
This exact combination only occurs for OP_OpenAutoindex
/ OP_OpenEphemeral
in the source
The reason you are hitting this is because as you add larger offsets to your LIMIT
, more data has to be kept in memory to be sorted until memory limits are hit and disk is used instead.
Once I have a reproducer I'll let you know of a workaround.
This is a bug in APSW.
The open flags are
SQLITE_OPEN_CREATE | SQLITE_OPEN_DELETEONCLOSE | SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_TRANSIENT_DB
The SQLite doc does say NULL is allowed. I vaguely recall in the past that a zero length string was passed.
This exact combination only occurs for
OP_OpenAutoindex
/OP_OpenEphemeral
in the sourceThe reason you are hitting this is because as you add larger offsets to your
LIMIT
, more data has to be kept in memory to be sorted until memory limits are hit and disk is used instead.Once I have a reproducer I'll let you know of a workaround.
Thank you very much ^^
Reproduced it with the code :
import apsw
import uuid
import time
import sys
from apsw.ext import print_augmented_traceback
def obfuscate(data):
if not data: return data
return bytes([x ^ 0xcc for x in data])
class ObfuscatedVFS(apsw.VFS):
def __init__(self, vfsname="obfuscated", basevfs=""):
self.vfs_name = vfsname
self.base_vfs = basevfs
apsw.VFS.__init__(self, self.vfs_name, self.base_vfs)
def xOpen(self, name, flags: int):
return ObfuscatedVFSFile(self.base_vfs, name, flags)
class ObfuscatedVFSFile(apsw.VFSFile):
def __init__(self, inheritfromvfsname, filename, flags):
apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
def xRead(self, amount, offset):
return obfuscate(super().xRead(amount, offset))
def xWrite(self, data, offset):
super().xWrite(obfuscate(data), offset)
obfuvfs = ObfuscatedVFS()
open_flags = apsw.SQLITE_OPEN_READWRITE | apsw.SQLITE_OPEN_CREATE
open_flags |= apsw.SQLITE_OPEN_URI
conn = apsw.Connection("my.db", vfs=obfuvfs.vfs_name)
cur = conn.cursor()
cur.execute('''
CREATE TABLE IF NOT EXISTS "hello" (
"a" CLOB NOT NULL,
"b" CLOB NOT NULL,
"c" CLOB NOT NULL,
"d" CLOB NOT NULL,
"e" CLOB NOT NULL
)''')
with conn:
for i in range(100000):
cur.execute('''
insert into hello (`a`, `b`, `c`, `d`, `e`) VALUES (?, ?, ?, ?, ?)
''', (str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4())))
for i in range(9000):
try:
print("from {0} to {1}".format( i*500, i*500+500) )
print("reached {} characters".format((i+1)*500*36*5))
cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
print(cur.fetchall())
time.sleep(1)
except Exception as exc:
print_augmented_traceback(*sys.exc_info())
break
It stops at :
from 4000 to 4500
reached 810000 characters
Even if I have 32 GB of ram on 64bits OS ?
Traceback (most recent call last):
File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 64, in <module>
cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
ObfuscatedVFS = <class '__main__.ObfuscatedVFS'>
ObfuscatedVFSFile = <class '__main__.ObfuscatedVFSFile'>
__annotations__ = {}
__builtins__ = <module 'builtins' (built-in)>
__cached__ = None
__doc__ = None
__file__ = 'c:\\Users\\abdelouahab\\Desktop\\New folder\\db_tester.py'
__loader__ = <_frozen_importlib_external.SourceFileLoader object at 0x00000232AD8DBBF0>
__name__ = '__main__'
__package__ = None
__spec__ = None
apsw = <module 'apsw' from 'C:\\Users\\abdelouahab\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\apsw\\__init__.cp312-win_amd64.pyd'>
conn = <apsw.Connection object at 0x00000232ADBB5F30>
cur = <apsw.Cursor object at 0x00000232AE028040>
exc = TypeError('filename should be a string')
i = 8
obfuscate = <function obfuscate at 0x00000232AD8CA3E0>
obfuvfs = <__main__.ObfuscatedVFS object at 0x00000232ADBE74D0>
open_flags = 70
print_augmented_traceback = <function print_augmented_traceback at 0x00000232AE0142C0>
sys = <module 'sys' (built-in)>
time = <module 'time' (built-in)>
uuid = <module 'uuid' from 'C:\\Users\\abdelouahab\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\uuid.py'>
File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 19, in xOpen
return ObfuscatedVFSFile(self.base_vfs, name, flags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
flags = [1054, 0]
name = None
self = <__main__.ObfuscatedVFS object at 0x00000232ADBE74D0>
File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 25, in __init__
apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
filename = None
flags = [1054, 0]
inheritfromvfsname = ''
self = <__main__.ObfuscatedVFSFile object at 0x00000232ADB83E30>
TypeError: filename should be a string
Edit : tried it on Linux (PopOS 64bits) same error at the same iteration
I was unable to find a workaround such as increasing the cache size. I am about to commit the fix. It will be part of the next release which is soon.
It worked now, performance is a hit (takes more time after that limitation, but at least it works ! ) Here is the result on Linux
I measured performances with this :
for i in range(90):
try:
t1 = time.time()
print("from {0} to {1}".format( i*500, i*500+500) )
cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
# print(cur.fetchall())
print("reached {} characters".format((i+1)*500*36*5))
perfor.append(time.time() - t1)
print("performance is {}".format(perfor))
time.sleep(1)
except Exception as exc:
print_augmented_traceback(*sys.exc_info())
break
And it seems it's getting bigger and bigger !
You are measuring one thing:
select * from hello order by a limit ?, ?
To do the order by
SQLite has to do a sort. The sort time is proportional to the number and size of items being sorted as you have found.
Turn on logging and SQLite will tell you if it is creating an automatic index.
Use EXPLAIN QUERY PLAN to see how SQLite executes your query. There is good documentation on the principles behind query execution.
You can do this to get a shell in the right place that knows how to format the query plan:
apsw.shell.Shell(db=conn).process_sql("explain query plan select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
With your code above I get this query plan:
id parent detail
9 0 SCAN hello
24 0 USE TEMP B-TREE FOR ORDER BY
SCAN
means it is going through the whole table start to finish. The TEMP B-TREE
is SQLite having to sort everything on demand.
I did this:
create index hello_a on hello(a)
Then the query plan becomes:
id parent detail
10 0 SCAN hello USING INDEX hello_a
It is instant because SQLite just has to seek to the right location in the index to produce the results and doesn't need to do a sort.
You also don't need that separate cursor and it isn't doing anything for you other than providing a potential source of bugs in the future when you use it while using it. All the cur.execute
can and should be replaced with conn.execute
.
You are measuring one thing:
select * from hello order by a limit ?, ?
To do the
order by
SQLite has to do a sort. The sort time is proportional to the number and size of items being sorted as you have found.Turn on logging and SQLite will tell you if it is creating an automatic index.
Use EXPLAIN QUERY PLAN to see how SQLite executes your query. There is good documentation on the principles behind query execution.
You can do this to get a shell in the right place that knows how to format the query plan:
apsw.shell.Shell(db=conn).process_sql("explain query plan select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
With your code above I get this query plan:
id parent detail 9 0 SCAN hello 24 0 USE TEMP B-TREE FOR ORDER BY
SCAN
means it is going through the whole table start to finish. TheTEMP B-TREE
is SQLite having to sort everything on demand.I did this:
create index hello_a on hello(a)
Then the query plan becomes:
id parent detail 10 0 SCAN hello USING INDEX hello_a
It is instant because SQLite just has to seek to the right location in the index to produce the results and doesn't need to do a sort.
You also don't need that separate cursor and it isn't doing anything for you other than providing a potential source of bugs in the future when you use it while using it. All the
cur.execute
can and should be replaced withconn.execute
.
Wow ! this just amazing ! Indexing the ordering reduced not only the execute time, but even the bug disapeared without the patch, the patch will improve even better ! Thank you so much !
I try to retrieve data using
SELECT something from product LIMIT ?,?
from an obfuscated db, with 500 pagination skip, the problem I get an error when I reach 5500 (skip=6000) : Is there any limit should be raised for the cursor ?The error :