utelle / SQLite3MultipleCiphers

SQLite3 encryption extension with support for multiple ciphers
https://utelle.github.io/SQLite3MultipleCiphers/
MIT License
390 stars 73 forks source link

CorruptError if SQLITE_CONFIG_MMAP_SIZE used #156

Closed rogerbinns closed 4 months ago

rogerbinns commented 4 months ago

The following program causes CorruptError:

import apsw

# calls sqlite3_config
apsw.config(apsw.SQLITE_CONFIG_MMAP_SIZE,  8448, - 1)

db = apsw.Connection("testdb")
db.pragma("hexkey", b"hello world".hex())
db.execute("create table x(y); insert into x values(zeroblob(78000))")
db.execute("select * from x").get

CorruptError is returned executing the select. Leaving out the hexkey also does not result in error.

If the mmap value is 8447 or less then the error does not occur. I was setting it to 2 **63 -1.

utelle commented 4 months ago

I have to admit that I have never experimented with the SQLITE_CONFIG_MMAP_SIZE config parameter.

CorruptError is returned executing the select. Leaving out the hexkey also does not result in error.

Leaving out calling PRAGMA hexkey results in an unencrypted database file. That is, the encryption code is not called.

What happens, if you use PRAGMA key instead? Same error? Most likely yes.

At the moment I have no idea, what may cause the error. I'll have to try to reproduce the error in my development environment to find out what is going wrong.

utelle commented 4 months ago

In the meantime I made a few experiments using the SQLite shell coming with the SQLite3 Multiple Ciphers releases.

The parameter MMAP_SIZE can also be set via PRAGMA mmap_size.

First, I used an older version of the shell I had at hand from other tests. If I used a value of 8192 I got the error "database disk image is malformed". With smaller values the error went away, although I did not reopen the database, but just reissued the SELECT command.

Then I used the shell from the latest release. But I was unable to reproduce the error. I could set mmap_size to any value.

This makes it extremely difficult to track down the cause of the error.

rogerbinns commented 4 months ago

It doesn't matter whether key or hexkey is used. If I catch the CorruptError, turn mmap back off, and do a read again then all is fine. I also double checked the database is encrypted, and that calling the vfs xRead method does give the right results.

The issue in your code is almost certainly how VFS xFetch is handled.

I put a break point where I can get the sqlite3_io_methods and it does look correct:

(gdb) p *fp.pMethods 
$6 = {iVersion = 3, xClose = 0x7fffe9833de0 <mcIoClose>, 
  xRead = 0x7fffe9845070 <mcIoRead>, 
  xWrite = 0x7fffe9844dc0 <mcIoWrite>, 
  xTruncate = 0x7fffe9826bb0 <mcIoTruncate>, 
  xSync = 0x7fffe9826bc0 <mcIoSync>, 
  xFileSize = 0x7fffe9826bd0 <mcIoFileSize>, 
  xLock = 0x7fffe9826be0 <mcIoLock>, 
  xUnlock = 0x7fffe9826bf0 <mcIoUnlock>, 
  xCheckReservedLock = 0x7fffe9826c00 <mcIoCheckReservedLock>, 
  xFileControl = 0x7fffe98c0970 <mcIoFileControl>, 
  xSectorSize = 0x7fffe9826c10 <mcIoSectorSize>, 
  xDeviceCharacteristics = 0x7fffe9826c40 <mcIoDeviceCharacteristics>, 
  xShmMap = 0x7fffe9826c50 <mcIoShmMap>, 
  xShmLock = 0x7fffe9826c60 <mcIoShmLock>, 
  xShmBarrier = 0x7fffe9826c70 <mcIoShmBarrier>, 
  xShmUnmap = 0x7fffe9826c80 <mcIoShmUnmap>, 
  xFetch = 0x7fffe9826ca0 <mcIoFetch>, 
  xUnfetch = 0x7fffe9826cc0 <mcIoUnfetch>}
rogerbinns commented 4 months ago

The changes APSW makes to SQLite defaults are in src/apsw.c and only if statically compiling in the amalgamation which I am doing in this case.

#define SQLITE_OMIT_DEPRECATED

#ifndef SQLITE_MAX_ATTACHED
#define SQLITE_MAX_ATTACHED 125
#endif

#ifndef SQLITE_MAX_MMAP_SIZE
#define SQLITE_MAX_MMAP_SIZE 0x1000000000000LL
#endif

#ifndef APSW_NO_NDEBUG
#define SQLITE_API static
#define SQLITE_EXTERN static
#endif

#ifdef SQLITE_DEBUG
#define SQLITE_ENABLE_API_ARMOR 1
#endif

Builds for PyPI also add SQLITE_ENABLE_COLUMN_METADATA configured in tools/setup-pypi.cfg

utelle commented 4 months ago

It doesn't matter whether key or hexkey is used.

Any other answer would have surprised me.

If I catch the CorruptError, turn mmap back off, and do a read again then all is fine. I also double checked the database is encrypted, and that calling the vfs xRead method does give the right results.

Maybe the SQLite compile time options used to compile APSW differ from those I typically use for the SQLite shell.

The issue in your code is almost certainly how VFS xFetch is handled.

xFetch is simply forwarded to the xFetch method of the underlying real VFS.

rogerbinns commented 4 months ago

xFetch is simply forwarded to the xFetch method of the underlying real VFS.

That can't be right for mcIoFetch. mcIoRead calls mcReadMainDb which does decryption. So mcIoFetch also needs to do decryption as does mcIoUnfetch need to do encryption. I'll claim the result of fetch being used because of memory mapping means that the raw encrypted data is coming back, hence the corruption error.

A simple fix is to make your interposer sqlite3_io_methods cap iVersion at 2.

utelle commented 4 months ago

xFetch is simply forwarded to the xFetch method of the underlying real VFS.

That can't be right for mcIoFetch. mcIoRead calls mcReadMainDb which does decryption. So mcIoFetch also needs to do decryption as does mcIoUnfetch need to do encryption. I'll claim the result of fetch being used because of memory mapping means that the raw encrypted data is coming back, hence the corruption error.

I will further analyze what SQLite does when using xFetch/xUnFetch. It is a bit irritating that writing encrypted content seems to work properly, in contrast to reading.

A simple fix is to make your interposer sqlite3_io_methods cap iVersion at 2.

As a quick fix that would probably work.

rogerbinns commented 4 months ago

I did try setting the xFetch entry in mcIoMethodsGlobal3 to 0 but then there is a SEGV in sqlite3OsFetch which isn't checking for a non-NULL pointer.

Setting iVersion to 2 in the same structure did make the issue go away.

utelle commented 4 months ago

I did try setting the xFetch entry in mcIoMethodsGlobal3 to 0 but then there is a SEGV in sqlite3OsFetch which isn't checking for a non-NULL pointer.

Setting method pointer to 0 is not a good idea. SQLite doesn't check for null pointer at some critical places (as I found out when I tried to add SQLite3 Multiple Ciphers to the WASM module).

Setting iVersion to 2 in the same structure did make the issue go away.

Setting iVersion to 2 will prevent that xFetch and xUnfetch will be called.

That is, your analysis is absolutely right that those 2 methods do not what they should, if encryption is enabled.

utelle commented 4 months ago

Ok, in the meantime I inspected the SQLite code. IMHO the methods xFetch/xUnfetch must not be called, if the database is encrypted. xFetch returns pointers to memory managed by the OS, and this means the content has to be the same as it is on disk. That is, unencrypted for a plain database or encrypted for an encrypted database. Decrypting the page content in xFetch would most likely cause database corruption.

The real question is, why method xFetch is called at all? The internal SQLite function setGetterMethod has been patched to check whether the database is encrypted or not. If memory mapping is supported and the database is not encrypted, the getter method getPageMMap will be selected; if the database is encrypted, the getter method getPageNormal will be selected. Thus, memory mapping should not be used, and xFetch should not be called. Why does it happen in the APSW environment?

@rogerbinns: Could you please find out which page getter method is set and whether it is correctly detected that the database is encrypted? The call stack when xFetch is entered could possibly be useful, too.

rogerbinns commented 4 months ago

There are wiki instructions which is all I am doing to run the code at the top. This is with the updated sqlite3.c in that repo putting a breakpoint ion mcIoFetch

#0  mcIoFetch (pFile=0xc94b20, iOfst=4096, iAmt=4096, pp=0x7fffffffcc60)
    at /space/mc/sqlite3/sqlite3.c:307830
#1  0x00007fffe9949a7d in sqlite3OsFetch (pp=0x7fffffffcc60, 
    iAmt=<optimised out>, iOff=<optimised out>, id=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:26372
#2  getPageMMap (pPager=0xc94998, pgno=2, ppPage=0x7fffffffccb0, flags=2)
    at /space/mc/sqlite3/sqlite3.c:62913
#3  0x00007fffe9925225 in sqlite3PagerGet (flags=2, 
    ppPage=0x7fffffffccb0, pgno=2, pPager=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:62975
#4  getAndInitPage (pBt=0xc0fa08, pgno=2, ppPage=ppPage@entry=0xc9e978, 
    bReadOnly=2) at /space/mc/sqlite3/sqlite3.c:73078
#5  0x00007fffe9925a1c in moveToRoot (pCur=pCur@entry=0xc9e8f0)
    at /space/mc/sqlite3/sqlite3.c:76207
#6  0x00007fffe9925d5f in sqlite3BtreeFirst (pCur=0xc9e8f0, 
    pRes=0x7fffffffce60) at /space/mc/sqlite3/sqlite3.c:76314
#7  0x00007fffe99bb67b in sqlite3VdbeExec (p=p@entry=0xc9f678)
    at /space/mc/sqlite3/sqlite3.c:99562
#8  0x00007fffe99cd440 in sqlite3Step (p=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:91371
#9  sqlite3_step (pStmt=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:91432
#10 sqlite3_step (pStmt=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:91421
#11 0x00007fffe99ddf94 in APSWCursor_step (
    self=self@entry=0x7fffe9d85bc0) at src/cursor.c:736

Hopefully this is enough.

rogerbinns commented 4 months ago

pPager in frame 2

(gdb) p *pPager
$3 = {pVfs = 0xc77738, exclusiveMode = 0 '\000', journalMode = 0 '\000', 
  useJournal = 1 '\001', noSync = 0 '\000', fullSync = 1 '\001', 
  extraSync = 0 '\000', syncFlags = 2 '\002', walSyncFlags = 10 '\n', 
  tempFile = 0 '\000', noLock = 0 '\000', readOnly = 0 '\000', 
  memDb = 0 '\000', memVfs = 0 '\000', eState = 1 '\001', 
  eLock = 1 '\001', changeCountDone = 0 '\000', setSuper = 0 '\000', 
  doNotSpill = 0 '\000', subjInMemory = 0 '\000', bUseFetch = 1 '\001', 
  hasHeldSharedLock = 1 '\001', dbSize = 21, dbOrigSize = 2, 
  dbFileSize = 21, dbHintSize = 21, errCode = 0, nRec = 0, 
  cksumInit = 4228306223, nSubRec = 0, pInJournal = 0x0, fd = 0xc94b20, 
  jfd = 0xc94cb0, sjfd = 0xc94be8, journalOff = 0, journalHdr = 0, 
  pBackup = 0x0, aSavepoint = 0x0, nSavepoint = 0, iDataVersion = 4, 
  dbFileVers = "\000\000\000\002\000\000\000\025\000\000\000\000\000\000\000", nMmapOut = 0, szMmap = 281474976710656, pMmapFreelist = 0x0, 
  nExtra = 136, nReserve = 32, vfsFlags = 33554694, sectorSize = 512, 
  mxPgno = 4294967294, lckPgno = 262145, pageSize = 4096, 
  journalSizeLimit = -1, zFilename = 0xc94d84 "/space/mc/testdb", 
  zJournal = 0xc94d96 "/space/mc/testdb-journal", 
  xBusyHandler = 0x7fffe987d680 <btreeInvokeBusyHandler>, 
  pBusyHandlerArg = 0xc0fa08, aStat = {9, 0, 23, 0}, 
  xReiniter = 0x7fffe9925130 <pageReinit>, 
  xGet = 0x7fffe99498a0 <getPageMMap>, pTmpSpace = 0xc94dd8 "", 
  pPCache = 0xc94ad0, pWal = 0x0, zWal = 0xc94daf "/space/mc/testdb-wal"}
(gdb) p *pPager->pVfs
$5 = {iVersion = 3, szOsFile = 200, mxPathname = 512, 
  pNext = 0x7fffe9ae61a0 <aVfs.2599>, 
  zName = 0xc777f0 "multipleciphers-unix", 
  pAppData = 0x7fffe9ae61a0 <aVfs.2599>, 
  xOpen = 0x7fffe984ee50 <mcVfsOpen>, 
  xDelete = 0x7fffe982b190 <mcVfsDelete>, 
  xAccess = 0x7fffe982b1a0 <mcVfsAccess>, 
  xFullPathname = 0x7fffe982b1b0 <mcVfsFullPathname>, 
  xDlOpen = 0x7fffe982b1c0 <mcVfsDlOpen>, 
  xDlError = 0x7fffe982b1d0 <mcVfsDlError>, 
  xDlSym = 0x7fffe982b1e0 <mcVfsDlSym>, 
  xDlClose = 0x7fffe982b1f0 <mcVfsDlClose>, 
  xRandomness = 0x7fffe982b200 <mcVfsRandomness>, 
  xSleep = 0x7fffe982b210 <mcVfsSleep>, 
  xCurrentTime = 0x7fffe982b220 <mcVfsCurrentTime>, 
  xGetLastError = 0x7fffe982b230 <mcVfsGetLastError>, 
  xCurrentTimeInt64 = 0x7fffe982b240 <mcVfsCurrentTimeInt64>, 
  xSetSystemCall = 0x7fffe982b250 <mcVfsSetSystemCall>, 
  xGetSystemCall = 0x7fffe982b260 <mcVfsGetSystemCall>, 
  xNextSystemCall = 0x7fffe982b270 <mcVfsNextSystemCall>}
utelle commented 4 months ago

There are wiki instructions which is all I am doing to run the code at the top.

I will set up the environment in my Linux environment and will try to further investigate the issue.

This is with the updated sqlite3.c in that repo putting a breakpoint ion mcIoFetch

#0  mcIoFetch (pFile=0xc94b20, iOfst=4096, iAmt=4096, pp=0x7fffffffcc60)
    at /space/mc/sqlite3/sqlite3.c:307830
#1  0x00007fffe9949a7d in sqlite3OsFetch (pp=0x7fffffffcc60, 
    iAmt=<optimised out>, iOff=<optimised out>, id=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:26372
#2  getPageMMap (pPager=0xc94998, pgno=2, ppPage=0x7fffffffccb0, flags=2)
    at /space/mc/sqlite3/sqlite3.c:62913
#3  0x00007fffe9925225 in sqlite3PagerGet (flags=2, 
    ppPage=0x7fffffffccb0, pgno=2, pPager=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:62975
#4  getAndInitPage (pBt=0xc0fa08, pgno=2, ppPage=ppPage@entry=0xc9e978, 
    bReadOnly=2) at /space/mc/sqlite3/sqlite3.c:73078
#5  0x00007fffe9925a1c in moveToRoot (pCur=pCur@entry=0xc9e8f0)
    at /space/mc/sqlite3/sqlite3.c:76207
#6  0x00007fffe9925d5f in sqlite3BtreeFirst (pCur=0xc9e8f0, 
    pRes=0x7fffffffce60) at /space/mc/sqlite3/sqlite3.c:76314
#7  0x00007fffe99bb67b in sqlite3VdbeExec (p=p@entry=0xc9f678)
    at /space/mc/sqlite3/sqlite3.c:99562
#8  0x00007fffe99cd440 in sqlite3Step (p=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:91371
#9  sqlite3_step (pStmt=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:91432
#10 sqlite3_step (pStmt=<optimised out>)
    at /space/mc/sqlite3/sqlite3.c:91421
#11 0x00007fffe99ddf94 in APSWCursor_step (
    self=self@entry=0x7fffe9d85bc0) at src/cursor.c:736

As can be seen the page getter method getPageMMap is called (# 3). This should not happen for an encrypted database. No idea, why it happens nevertheless.

utelle commented 4 months ago

Ok, I tested under Linux based on the wiki instructions. Here are the test results:

APSW debug build: missing sys.apsw_fault_inject_control
                Python  /home/ulrich/Development/GitHub/apsw-sqlite3mc/.venv/bin/python3 sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0) 64bit ELF
Testing with APSW file  /home/ulrich/Development/GitHub/apsw-sqlite3mc/apsw/__init__.cpython-310-x86_64-linux-gnu.so
          APSW version  3.46.0.0
    SQLite lib version  3.46.0
SQLite headers version  3046000
    Using amalgamation  True
.............................................................A message due to RecursionError is possible, and what is being tested
.....................................................................
----------------------------------------------------------------------
Ran 130 tests in 31.732s

OK

No CorruptError. Hm???

rogerbinns commented 4 months ago

You only need the 6 lines of code in this issue report at the top to reproduce. The wiki has instructions at the bottom on how to set a breakpoint in code of your choice.

In commit https://github.com/utelle/apsw-sqlite3mc/commit/2c623e865b26a44751fa12254675912c3dd81ae2 I temporarily worked around the issue which is why it doesn't show up now. That was to allow a build for test pypi to complete.

I've now removed the workaround and added another test that explicitly checks memory mapping.

utelle commented 4 months ago

You only need the 6 lines of code in this issue report at the top to reproduce. The wiki has instructions at the bottom on how to set a breakpoint in code of your choice.

Thanks. In the meantime I have set up the project in my Linux environment, so that I can continue to test there.

In commit utelle/apsw-sqlite3mc@2c623e8 I temporarily worked around the issue which is why it doesn't show up now. That was to allow a build for test pypi to complete.

I've now removed the workaround and added another test that explicitly checks memory mapping.

I see now the CorruptError and I will try to track down, what's causing the problem. This may take some time.

utelle commented 4 months ago

Actually, it was only a small glitch. The codec pointer was set in the SQLite file structure a bit too late, so that the SQLite pager got the impression that no codec is in place and selected the page getter method for memory-mapped database files.

Commit c95bd21f4fa20328b521a33a57017607ad17f073 should fix the issue.

I committed an updated amalgamation to apsw-sqlite3mc.

utelle commented 4 months ago

I could run all tests without problems. Therefore closing this issue. Reopen if necessary.