zopefoundation / ZEO

ZODB Client-Server framework
Other
42 stars 19 forks source link

Another data corruption due to concurrency bug (ZEO5 regression; no external invalidations) #166

Closed navytux closed 3 years ago

navytux commented 3 years ago

( moving out from https://github.com/zopefoundation/ZEO/issues/155#issuecomment-811984478 since this seems to be a different problem )

Something really wrong is going on with ZEO5.

In addition to the test program provided in https://github.com/zopefoundation/ZEO/issues/155, where there are 8 simultaneous workers each of which opens its own storage-level connection, I've tried to run the test program from https://github.com/zopefoundation/ZODB/issues/290 with minimal edit to use ZEO instead of MappingStorage, and it fails too.

In that test program there is only one storage-level connection and two simultaneous users that are using that single storage-level connection. In other words there are no external invalidations involved at all.

https://lab.nexedi.com/kirr/wendelin.core/blob/e2dc8651/lib/tests/testprog/zopenrace.py#L102-203

This failure indicates data corruption. Quoting https://github.com/zopefoundation/ZODB/issues/290:

---- 8< ---- That in turn can lead to loose of Consistency of the database if mix of current and old data is used to process a transaction. A classic example would be bank accounts A, B and C with A<-B and A<-C transfer transactions. If transaction that handles A<-C sees stale data for A when starting its processing, it results in A loosing what it should have received from B.

...

The program simulates two clients: one (T2) constantly modifies two integer objects preserving invariant that their values stay equal. The other client (T1) constantly opens the database and verifies the invariant. T1 forces access to one of the object to always go through loading from the database, and this way if live cache becomes stale the bug is observed as invariant breakage. ---- 8< ----

Here is how it fails with ZEO:

.../wendelin.core$ WENDELIN_CORE_TEST_DB="<zeo>" py.test -vs -k openrace lib/tests/test_zodb.py
============================== test session starts ==============================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.9.0, pluggy-0.13.1 -- /home/kirr/src/wendelin/venv/z-dev/bin/python2
cachedir: .pytest_cache
rootdir: /home/kirr/src/wendelin/wendelin.core
collected 8 items / 6 deselected / 2 selected

lib/tests/test_zodb.py::test_zodb_zopenrace_basic OK
PASSED
lib/tests/test_zodb.py::test_zodb_zopenrace FAILED

=================================== FAILURES ====================================
______________________________ test_zodb_zopenrace ______________________________

    def test_zodb_zopenrace():
        # exercises ZODB.Connection + particular storage implementation
>       zopenrace.main()

lib/tests/test_zodb.py:382:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<decorator-gen-2>:2: in main
    ???
../../tools/go/pygolang/golang/__init__.py:103: in _
    return f(*argv, **kw)
lib/tests/testprog/zopenrace.py:115: in main
    test(zstor)
<decorator-gen-3>:2: in test
    ???
../../tools/go/pygolang/golang/__init__.py:103: in _
    return f(*argv, **kw)
lib/tests/testprog/zopenrace.py:201: in test
    wg.wait()
golang/_sync.pyx:246: in golang._sync.PyWorkGroup.wait
    ???
golang/_sync.pyx:226: in golang._sync.PyWorkGroup.go.pyrunf
    ???
lib/tests/testprog/zopenrace.py:165: in T1
    t1()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def t1():
        transaction.begin()
        zconn = db.open()

        root = zconn.root()
        obj1 = root['obj1']
        obj2 = root['obj2']

        # obj1 - reload it from zstor
        # obj2 - get it from zconn cache
        obj1._p_invalidate()

        # both objects must have the same values
        i1 = obj1.i
        i2 = obj2.i
        if i1 != i2:
>           raise AssertionError("T1: obj1.i (%d)  !=  obj2.i (%d)" % (i1, i2))
E           AssertionError: T1: obj1.i (16)  !=  obj2.i (15)

lib/tests/testprog/zopenrace.py:156: AssertionError

P.S. Suggested patch from https://github.com/zopefoundation/ZEO/issues/155#issuecomment-581046248 does not help here - even though it seemingly fixes https://github.com/zopefoundation/ZEO/issues/155, it does not help to fix https://github.com/zopefoundation/ZODB/issues/290 with ZEO5.

P.P.S. ZEO 5.2.2-1-g3d90ed42, ZODB 5.6.0-18-g75c703fcf, CPython 2.7.18

navytux commented 3 years ago

/cc @jimfulton

d-maurer commented 3 years ago

Kirill Smelkov wrote at 2021-4-1 21:59 -0700:

( moving out from https://github.com/zopefoundation/ZEO/issues/155#issuecomment-811984478 since this seems to be a different problem )

Something really wrong is going on with ZEO5.

I tried to reproduce your observation but failed: ZEO==5.2.2; ZODB==5.6.0; CPython==3.6.9 I used your zopenrace.py with the following changes:


...
from ZEO import server, DB
...
def main():
    address, stop = server()
    db = DB(address)
    ...
    N = 10000
    t1 = go(T1, N)
    t2 = go(T2, N)
    t1.join()
    t2.join()
    stop()

Apparently, the problem depends on further details.
navytux commented 3 years ago

@d-maurer, thanks for feedback.

I checked a bit more from scratch and, contrary to the problem from https://github.com/zopefoundation/ZEO/issues/155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.

To verify I used zoperace-zeo.py which is zopenrace.py from https://github.com/zopefoundation/ZODB/issues/290 plus the following change that you suggested:

--- a/zopenrace.py.kirr0
+++ b/zopenrace.py
@@ -87,8 +87,7 @@
 """

 from __future__ import print_function
-from ZODB import DB
-from ZODB.MappingStorage import MappingStorage
+from ZEO import server, DB
 import transaction
 from persistent import Persistent

@@ -110,8 +109,8 @@ def __init__(self, i):

 def main():
-    zstor = MappingStorage()
-    db = DB(zstor)
+    address, stop = server()
+    db = DB(address)

     # init initializes the database with two integer objects - obj1/obj2 that are set to 0.
@@ -194,6 +193,7 @@ def t2():
     t2 = go(T2, N)
     t1.join()
     t2.join()
+    stop()

     if not all(okv):
         raise AssertionError('FAIL')

For me it fails with CPython 2.7.18 on two different machines. For completeness I attach full log of one such verification.

Please note, that I hit this ZEO bug only accidentally, while trying to find a place for my ZODB test programs in wendelin.core test suite, so that those test programs are run wrt all ZODB versions we care about (ZODB4, ZODB5 and ZODB4 with MVCC backport from ZODB5 (see https://lab.nexedi.com/nexedi/ZODB/commit/8e7eab33?expanded=1), and wrt all storages (FileStorage, ZEO and NEO) as explained in this patch.

It is possible that the bug is indeed in trollius. It is also possible that the bug is in ZEO5 and only being more likely to hit with trollius. For now I did not looked into ZEO5 to verify it for concurrency-correctness at all.

Thanks again for feedback, Kirill

Log of verification with CPython 2.7.18 (+ ZODB-5.6.0, ZEO-5.2.2) ```console kirr@teco:~/tmp/trashme/Z$ virtualenv -p python2 py2.venv created virtual environment CPython2.7.18.final.0-64 in 713ms creator CPython2Posix(dest=/home/kirr/tmp/trashme/Z/py2.venv, clear=False, no_vcs_ignore=False, global=False) seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/kirr/.local/share/virtualenv) added seed packages: pip==20.3.4, pkg_resources==0.0.0, setuptools==44.1.1, wheel==0.34.2 activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator kirr@teco:~/tmp/trashme/Z$ . py2.venv/bin/activate (py2.venv) kirr@teco:~/tmp/trashme/Z$ python Python 2.7.18 (default, Mar 9 2021, 11:09:26) [GCC 10.2.1 20210110] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> (py2.venv) kirr@teco:~/tmp/trashme/Z$ pip install ZODB ZEO DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality. Collecting ZODB Downloading ZODB-5.6.0-py2.py3-none-any.whl (413 kB) |████████████████████████████████| 413 kB 2.3 MB/s Collecting ZEO Downloading ZEO-5.2.2-py2.py3-none-any.whl (288 kB) |████████████████████████████████| 288 kB 13.7 MB/s Collecting zc.lockfile Downloading zc.lockfile-2.0-py2.py3-none-any.whl (9.7 kB) Collecting persistent>=4.4.0 Downloading persistent-4.6.4-cp27-cp27mu-manylinux2010_x86_64.whl (240 kB) |████████████████████████████████| 240 kB 15.2 MB/s Collecting zope.interface Downloading zope.interface-5.3.0-cp27-cp27mu-manylinux2010_x86_64.whl (243 kB) |████████████████████████████████| 243 kB 14.6 MB/s Collecting transaction>=2.4 Downloading transaction-3.0.1-py2.py3-none-any.whl (47 kB) |████████████████████████████████| 47 kB 4.4 MB/s Collecting BTrees>=4.2.0 Downloading BTrees-4.7.2-cp27-cp27mu-manylinux2010_x86_64.whl (2.9 MB) |████████████████████████████████| 2.9 MB 15.5 MB/s Collecting ZConfig Downloading ZConfig-3.5.0-py2.py3-none-any.whl (134 kB) |████████████████████████████████| 134 kB 22.8 MB/s Collecting zodbpickle>=1.0.1 Downloading zodbpickle-2.0.0-cp27-cp27mu-manylinux2010_x86_64.whl (309 kB) |████████████████████████████████| 309 kB 22.2 MB/s Collecting six Downloading six-1.15.0-py2.py3-none-any.whl (10 kB) Processing /home/kirr/.cache/pip/wheels/94/b0/5b/1448d37a25d9b08c1495306d43197e935f1740da4076849e15/zdaemon-4.3-cp27-none-any.whl Collecting trollius; python_version == "2.7" Downloading trollius-2.2.post1.tar.gz (274 kB) |████████████████████████████████| 274 kB 16.5 MB/s Collecting futures; python_version == "2.7" Downloading futures-3.3.0-py2-none-any.whl (16 kB) Requirement already satisfied: setuptools in ./py2.venv/lib/python2.7/site-packages (from zc.lockfile->ZODB) (44.1.1) Collecting cffi; platform_python_implementation == "CPython" Downloading cffi-1.14.5-cp27-cp27mu-manylinux1_x86_64.whl (389 kB) |████████████████████████████████| 389 kB 25.9 MB/s Collecting pycparser Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB) |████████████████████████████████| 112 kB 21.0 MB/s Building wheels for collected packages: trollius Building wheel for trollius (setup.py) ... done Created wheel for trollius: filename=trollius-2.2.post1-py2-none-any.whl size=122121 sha256=a442356ce1ee9f0165d6c2ee4678e52d5c489d8e4f00e6945e18d62c017e6e37 Stored in directory: /home/kirr/.cache/pip/wheels/d7/3a/73/2eedf02f638d8dc7dbcac4dd151f26a331d1a1053d49a85544 Successfully built trollius Installing collected packages: zc.lockfile, zope.interface, pycparser, cffi, persistent, transaction, BTrees, ZConfig, zodbpickle, six, ZODB, zdaemon, futures, trollius, ZEO Successfully installed BTrees-4.7.2 ZConfig-3.5.0 ZEO-5.2.2 ZODB-5.6.0 cffi-1.14.5 futures-3.3.0 persistent-4.6.4 pycparser-2.20 six-1.15.0 transaction-3.0.1 trollius-2.2.post1 zc.lockfile-2.0 zdaemon-4.3 zodbpickle-2.0.0 zope.interface-5.3.0 (py2.venv) kirr@teco:~/tmp/trashme/Z$ wget https://lab.nexedi.com/kirr/misc/raw/b9092e3/zodb/zopenrace-zeo.py --2021-04-05 10:51:33-- https://lab.nexedi.com/kirr/misc/raw/b9092e3/zodb/zopenrace-zeo.py Распознаётся lab.nexedi.com (lab.nexedi.com)… 176.31.129.213, 161.97.166.226 Подключение к lab.nexedi.com (lab.nexedi.com)|176.31.129.213|:443... соединение установлено. HTTP-запрос отправлен. Ожидание ответа… 200 OK Длина: 7100 (6,9K) [text/plain] Сохранение в: «zopenrace-zeo.py» zopenrace-zeo.py 100%[=======================================================>] 6,93K --.-KB/s за 0s 2021-04-05 10:51:33 (17,1 MB/s) - «zopenrace-zeo.py» сохранён [7100/7100] (py2.venv) kirr@teco:~/tmp/trashme/Z$ ll итого 12 drwxr-xr-x 5 kirr kirr 4096 апр 5 10:51 py2.venv -rw-r--r-- 1 kirr kirr 7100 апр 5 10:51 zopenrace-zeo.py (py2.venv) kirr@teco:~/tmp/trashme/Z$ chmod a+x zopenrace-zeo.py (py2.venv) kirr@teco:~/tmp/trashme/Z$ ./zopenrace-zeo.py Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "./zopenrace-zeo.py", line 160, in T1 t1() File "./zopenrace-zeo.py", line 153, in t1 raise AssertionError("T1: obj1.i (%d) != obj2.i (%d)" % (i1, i2)) AssertionError: T1: obj1.i (21) != obj2.i (20) Traceback (most recent call last): File "./zopenrace-zeo.py", line 205, in main() File "./zopenrace-zeo.py", line 200, in main raise AssertionError('FAIL') AssertionError: FAIL ```
d-maurer commented 3 years ago

Kirill Smelkov wrote at 2021-4-5 01:50 -0700: @.***, thanks for feedback.

I checked a bit more from scratch and, contrary to the problem from https://github.com/zopefoundation/ZEO/issues/155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.

I could reproduce the problem here with Python 2 -- investigating.

d-maurer commented 3 years ago

Dieter Maurer wrote at 2021-4-6 07:22 +0200:

Kirill Smelkov wrote at 2021-4-5 01:50 -0700: @.***, thanks for feedback.

I checked a bit more from scratch and, contrary to the problem from https://github.com/zopefoundation/ZEO/issues/155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.

I could reproduce the problem here with Python 2 -- investigating.

The problem is obviously associated with a race condition: the failure happens after different rounds, rarely not all.

To understand what happens, I have activated logging (address, stop = server(log="/tmp/zeo.log")) and instrumented some methods in ZEO.ClientStorage and ZODB.mvccadapter:

from decorator import decorator
@decorator
def trace(f, *args, **kw):
    logger.debug("%r %r %r", f, args, kw)
    return f(*args, **kw)

I also stop the test as soon as the first failure is observed. One run has shown the error already for the first modification. Below, I show the corresponding log file. The _invalidate_finish comes from the modifying client, the loadBefore from the verifying client. The serial in the last loadBefore shows that it already sees the new transaction. A possible explanation could be that the new transaction was observable before _invalidate_finish has been completed.

2021-04-06T09:46:48 INFO ZEO.runzeo (3694) opening storage '1' using MappingStorage
------
2021-04-06T09:46:48 INFO ZEO.StorageServer StorageServer created RW with storages: 1:RW:Mapping Storage
------
2021-04-06T09:46:48 DEBUG trollius Using selector: EpollSelector
------
2021-04-06T09:46:48 INFO ZEO.asyncio.server listening on (u'127.0.0.1', 0)
------
2021-04-06T09:46:48 DEBUG ZEO.tests.forker SERVER CREATED
------
2021-04-06T09:46:48 DEBUG ZEO.tests.forker ADDRESS SENT
------
2021-04-06T09:46:48 INFO ZEO.ClientStorage ('127.0.0.1', 37011) ClientStorage (pid=3694) created RW/normal for storage: '1'
------
2021-04-06T09:46:48 INFO ZEO.cache created temporary cache file '<fdopen>'
------
2021-04-06T09:46:48 DEBUG trollius Using selector: EpollSelector
------
2021-04-06T09:46:48 DEBUG ZEO.asyncio.client disconnected <ZEO.asyncio.client.Client object at 0x7f5503226f90> None
------
2021-04-06T09:46:48 DEBUG ZEO.asyncio.client try_connecting
------
2021-04-06T09:46:48 DEBUG ZEO.asyncio.server Accepted connection
------
2021-04-06T09:46:48 INFO ZEO.asyncio.base Connected server protocol
------
2021-04-06T09:46:48 INFO ZEO.asyncio.base Connected Protocol(('127.0.0.1', 37011), '1', 0)
------
2021-04-06T09:46:48 INFO ZEO.asyncio.server received handshake 'Z5'
------
2021-04-06T09:46:48 INFO ZEO.ClientStorage ('127.0.0.1', 37011) Connected to storage: ('localhost', 37011)
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x01') {}
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit <Connection at 7f55027b3210>
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') lock: transactions waiting: 0
------
2021-04-06T09:46:48 BLATHER ZEO.StorageServer (127.0.0.1:37011) Preparing to commit transaction: 1 objects, 96 bytes
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') unlock: transactions waiting: 0
------
2021-04-06T09:46:48 DEBUG ZODB.mvccadapter <function _invalidate_finish at 0x7f550c9ea8d0> (<ZODB.mvccadapter.MVCCAdapter object at 0x7f55027b31d0>, '\x03\xdf1\x92\xd0\x88\x02\xdd', set(['\x00\x00\x00\x00\x00\x00\x00\x00']), <ZODB.mvccadapter.MVCCAdapterInstance object at 0x7f55027b32d0>) {}
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 new transaction
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit <Connection at 7f55027b3210>
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') lock: transactions waiting: 0
------
2021-04-06T09:46:48 BLATHER ZEO.StorageServer (127.0.0.1:37011) Preparing to commit transaction: 3 objects, 291 bytes
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') unlock: transactions waiting: 0
------
2021-04-06T09:46:48 DEBUG ZODB.mvccadapter <function _invalidate_finish at 0x7f550c9ea8d0> (<ZODB.mvccadapter.MVCCAdapter object at 0x7f55027b31d0>, '\x03\xdf1\x92\xd0\x8c\xaf\xaa', set(['\x00\x00\x00\x00\x00\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x01', '\x00\x00\x00\x00\x00\x00\x00\x02']), <ZODB.mvccadapter.MVCCAdapterInstance object at 0x7f55027b32d0>) {}
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG txn.140003071555328 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x00', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x00', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003071555328 commit <Connection at 7f55027b3dd0>
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') lock: transactions waiting: 0
------
2021-04-06T09:46:48 BLATHER ZEO.StorageServer (127.0.0.1:37011) Preparing to commit transaction: 2 objects, 194 bytes
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') unlock: transactions waiting: 0
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZODB.mvccadapter <function _invalidate_finish at 0x7f550c9ea8d0> (<ZODB.mvccadapter.MVCCAdapter object at 0x7f55027b31d0>, '\x03\xdf1\x92\xd0\x93\xdc\xbb', set(['\x00\x00\x00\x00\x00\x00\x00\x01', '\x00\x00\x00\x00\x00\x00\x00\x02']), <ZODB.mvccadapter.MVCCAdapterInstance object at 0x7f55027b3e10>) {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x93\xdc\xbc') {}
------
2021-04-06T09:46:48 DEBUG txn.140003071555328 commit
d-maurer commented 3 years ago

Dieter Maurer wrote at 2021-4-6 10:10 +0200:

Dieter Maurer wrote at 2021-4-6 07:22 +0200:

Kirill Smelkov wrote at 2021-4-5 01:50 -0700: @.***, thanks for feedback.

I checked a bit more from scratch and, contrary to the problem from https://github.com/zopefoundation/ZEO/issues/155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.

I could reproduce the problem here with Python 2 -- investigating.

The problem is understood and related to PR #307: during the development of this PR, it was recognized that it is vital when the value returned by lastTransaction changes. The lastTransaction interface now specifies that lastTransaction must return the last transaction for which processing of invalidations is complete.

@jmuchemb I am not sure that the requirement above is already sufficient: at various interfaces, the tid and the associated invalidations are tightly coupled. The requirement wants lastTransaction updated after the processing of the invalidations. What happens when between this update invalidations for a new transaction are processed?

For ZEO.ClientStorage.ClientStorage, the lastTransaction value actually comes from its cache. Its LastTid is regularly set before invalidation processing. Thus, ClientStorage does not satisfy the requirement above.

navytux commented 3 years ago

@d-maurer, thanks a lot for investigating.

@jmuchemb I am not sure that the requirement above is already sufficient: at various interfaces, the tid and the associated invalidations are tightly coupled. The requirement wants lastTransaction updated after the processing of the invalidations. What happens when between this update invalidations for a new transaction are processed?

Since it was me to originally push for this interface change, let me say that you are right. If the follow-up invalidation is processed in between we are getting into scenario that is explained in https://github.com/zopefoundation/ZODB/issues/290 and leads to corrupt view of data when updating to next transaction.

The interface should state not only that "process invalidations" is ordered with lastTransaction via "happens before" (Inv_x < L_y for x ≤ y), but also that non of further invalidations are processed yet (L_y < Inv_z for y < z). This should remove the possibility of future invalidation getting in between invalidate and lastTransaction.

By default I let @jmuchemb comment and handle interface amendment since it was his version of text to be merged into interfaces instead of https://github.com/zopefoundation/ZODB/pull/313.


Looking at https://github.com/zopefoundation/ZEO/pull/167 I'm also sorry to say that I missed that ZEO5 does not confirm to requested interface amendment. I verified it for ZEO4 (https://github.com/zopefoundation/ZODB/pull/307#discussion_r434145034), and from https://github.com/zopefoundation/ZODB/pull/307#discussion_r434166238 got the impression it is also the case for ZEO5. But I never looked inside to verifiy that by myself. Let me appologize...

Regarding https://github.com/zopefoundation/ZEO/pull/167 changes I would also suggest to move RaceConditionTests to ZODB itself so that all storages are excercised for this bug, similarly to e.g. test in PackableStorage:

https://github.com/zopefoundation/ZODB/blob/5.6.0-19-gdad778016/src/ZODB/tests/PackableStorage.py#L103 https://github.com/zopefoundation/ZODB/blob/5.6.0-19-gdad778016/src/ZODB/tests/testFileStorage.py#L41-L57 https://github.com/zopefoundation/ZODB/blob/5.6.0-19-gdad778016/src/ZODB/tests/testDemoStorage.py#L45-L56

(see also https://github.com/zopefoundation/ZODB/issues/290#issuecomment-811998279 for similar call)

Thanks again for looking into this issue, Kirill