zodb / relstorage

A backend for ZODB that stores pickles in a relational database.
Other
53 stars 46 forks source link

Fix deleteObject in history-preserving mode to conform to ZODB interface #484

Closed navytux closed 1 year ago

navytux commented 2 years ago

ZODB specifies deleteObject to create new revision that indicates object removal:

def deleteObject(oid, serial, transaction):
    """Mark an object as deleted

    This method marks an object as deleted VIA A NEW OBJECT
    REVISION.  Subsequent attempts to load current data for the
    object will fail with a POSKeyError, but loads for
    non-current data will succeed if there are previous
    non-delete records.  The object will be removed from the
    storage when all not-delete records are removed.

https://github.com/zopefoundation/ZODB/blob/bc13ca74/src/ZODB/interfaces.py#L1292-L1307 (emphasis mine)

However currently for history-preserving mode, as explained in https://github.com/zopefoundation/ZODB/issues/318#issuecomment-657683419, RelStorage purges latest object revision instead of creating new one with whiteout indication. This goes against deleteObject specification and, as demonstrated by attached test program, against particular FileStorage behaviour.

-> Fix it.

P.S. I'm complete RelStorage newbie and looked only briefly. It could be that my patch is e.g. incomplete, or not optimal. However it demonstrates a real problem, and it fixes both adjusted testcase and failure of attached tdelete.py

P.P.S. Tested only with SQLite backend.

---- 8< ---- (tdelete.py)
#!/usr/bin/env python
"""tdelete.py demonstrates that deleteObject should create new whiteout
record, and that older data records should be still accessible.

e.g. with FileStorage:

   $ ./tdelete.py file://1.db
   @03e40964a0766f33 (= 280359404597309235) obj<0000000000000001>  ->  int(0)
   @03e40964a0790944 (= 280359404597479748) obj<0000000000000001>  ->  int(1)

   --------

   @03e40964a0766f33  obj<0000000000000001>  ->  int(0)    # must be int(0)
   @03e40964a0790944  obj<0000000000000001>  ->  int(1)    # must be int(1)

However it currently fails with RelStorage, because deleteObject does not
create new whiteout revision and instead purges already committed data:

    $ rm x/*; ./tdelete.py sqlite://?data_dir=`pwd`/x
    @03e40972d5408022 (= 280359465612509218) obj<0000000000000001>  ->  int(0)
    @03e40972d541ddee (= 280359465612598766) obj<0000000000000001>  ->  int(1)

    --------

    @03e40972d5408022  obj<0000000000000001>  ->  int(0)    # must be int(0)
    Traceback (most recent call last):
      File "./tdelete.py", line 84, in <module>
        main()
      File "./tdelete.py", line 80, in main
        dumpObjAt(at1, "must be int(1)")
      File "./tdelete.py", line 75, in dumpObjAt
        obj = conn.get(oid)
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/Connection.py", line 238, in get
        obj = self._reader.getGhost(p)
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/serialize.py", line 598, in getGhost
        unpickler = self._get_unpickler(pickle)
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/serialize.py", line 478, in _get_unpickler
        file = BytesIO(pickle)
    TypeError: StringIO() argument 1 must be string or buffer, not None
"""

from __future__ import print_function

import zodburi
from persistent import Persistent
from ZODB.DB import DB
from ZODB.Connection import TransactionMetaData
from ZODB.utils import u64
import transaction
import sys

class PInt(Persistent):
    def __init__(self, i):
        self.i = i
    def __str__(self):
        return "int(%d)" % self.i

def h(tid):
    return tid.encode('hex')

def dump(obj):
    print("@%s (= %d) obj<%s>  ->  %s" % (h(obj._p_serial), u64(obj._p_serial), h(obj._p_oid), obj))

def main():
    zurl = sys.argv[1]
    zstoropen, dbkw = zodburi.resolve_uri(zurl)

    stor = zstoropen()
    db = DB(stor, **dbkw)

    conn = db.open()
    root = conn.root()

    root['X'] = obj = PInt(0)
    transaction.commit()
    dump(obj)
    at0 = obj._p_serial
    oid = obj._p_oid

    obj.i += 1
    transaction.commit()
    dump(obj)
    at1 = obj._p_serial

    txn_meta = TransactionMetaData()
    stor.tpc_begin(txn_meta)
    stor.deleteObject(oid, at1, txn_meta)
    stor.tpc_vote(txn_meta)
    stor.tpc_finish(txn_meta)

    print('\n--------\n')

    def dumpObjAt(at, comment):
        conn = db.open(at=at)
        obj = conn.get(oid)
        print("@%s  obj<%s>  ->  %s\t# %s" % (h(at), h(oid), obj, comment))
        conn.close()

    dumpObjAt(at0, "must be int(0)")
    dumpObjAt(at1, "must be int(1)")

if __name__ == '__main__':
    main()

P.P.P.S. SQLite URI resolver is currently broken after 08259fa909b6 (Finer control over sqlite storage locking, oid allocation and stats). I've used the following local patch as a workaround:

--- a/src/relstorage/zodburi_resolver.py
+++ b/src/relstorage/zodburi_resolver.py
@@ -121,14 +121,14 @@ def factory(options):
         return factory, unused

 class SqliteAdapterHelper(Resolver):
-    _string_args = ('path',)
+    _string_args = ('data_dir',)

     def __call__(self, parsed_uri, kw):
         kw, unused = self.interpret_kwargs(kw)

         def factory(options):
             from relstorage.adapters.sqlite.adapter import Sqlite3Adapter
-            return Sqlite3Adapter(options=options, **kw)
+            return Sqlite3Adapter(options=options, pragmas={}, **kw)
         return factory, unused

 # The relstorage support is inspired by django-zodb.
navytux commented 2 years ago

@jamadden, all AppVeyor tests passed. This pull-request is no longer work-in-progress.