zopefoundation / ZODB

Python object-oriented database
https://zodb-docs.readthedocs.io/
Other
681 stars 92 forks source link

Fix behavior of ``FileStorage.restore`` when provided wrong ``prev_txn`` #397

Open perrinjerome opened 8 months ago

perrinjerome commented 8 months ago

See discussions from https://github.com/zopefoundation/ZODB/pull/395#discussion_r1474127300 about removing + 8 from tend

perrinjerome commented 8 months ago

I'm not sure what should be the behavior if restore is called with a prev_txn which does not contain oid.

The docs are :

https://github.com/zopefoundation/ZODB/blob/6f8878124fa8bf1cc1932a8f67da8d15b653897c/src/ZODB/interfaces.py#L978-L981

https://github.com/zopefoundation/ZODB/blob/6f8878124fa8bf1cc1932a8f67da8d15b653897c/src/ZODB/FileStorage/FileStorage.py#L715-L722

If prev_txn does not exist, then it can be ignored. This was not the case so I made a first commit to change this.

But if prev_txn exists but does not contain oid it might be an error case, in this version it's also ignored.

d-maurer commented 8 months ago

Jérome Perrin wrote at 2024-2-14 07:20 -0800:

I'm not sure what should be the behavior if restore is called with a prev_txn which does not contain oid.

I have looked at the code: The only purpose of prev_txn is to determine prev_pos. If this succeeds, we know that we find data at prev_pos and therefore we can write a backpointer to prev_pos instead of writing data ourself. This means, prev_txn is there only as an optimization. If it does not contain the correct data, prep_pos will be 0 and the optimazation will not be taken.

navytux commented 8 months ago

Jérome, thanks for doing this PR and adding tests for +8 and other problems.

I agree that prev_txn semantically is just a hint for copy detection on storage side. It is optimization that, unfortunately, is exposed as public API. An UNDO is the same store operation, but usually it stores content that is already there in the database. Git detects this duplication via sha1 - all internally. But in ZODB world the API was setup to help storages handle copies. In simple cases that helps, but generally I think it hurts and constraints storages unneccessaryly.

Anyway about this:

But if prev_txn exists but does not contain oid it might be an error case, in this version it's also ignored.

I agree it is an inconsistency if prev_txn exists but does not contain an oid. I think it is better report an error here, or at least log a warning. But given that frequently nobody read logs raising an error might be better. I say this because this inconsistency indicate a really potential corruption of data, and such events should not pass by silently.

For the record ZODB/go exposes the hint as

https://lab.nexedi.com/kirr/neo/-/blob/f7776fc1/go/zodb/zodb.go#L249-258

and it exposes it only because this prev_txn is already semantically there in IStorage and tools like zodb dump need it to be able to dump storage with preserving full information, so that bit-to-bit zodb restore is possible:

https://lab.nexedi.com/nexedi/zodbtools/-/blob/513c296c/zodbtools/zodbdump.py#L49 https://lab.nexedi.com/nexedi/zodbtools/-/blob/513c296c/zodbtools/zodbdump.py#L144-146 https://lab.nexedi.com/nexedi/zodbtools/-/blob/513c296c/zodbtools/zodbdump.py#L541-549 https://lab.nexedi.com/nexedi/zodbtools/-/blob/513c296c/zodbtools/zodbdump.py#L425-432 https://lab.nexedi.com/nexedi/zodbtools/-/blob/513c296c/zodbtools/zodbcommit.py#L91-100 https://lab.nexedi.com/nexedi/zodbtools/-/blob/513c296c/zodbtools/zodbcommit.py#L120-121 https://lab.nexedi.com/kirr/neo/-/blob/f7776fc1/go/zodb/zodbtools/dump.go#L102-103

Kirill

perrinjerome commented 8 months ago

Thank you for the feedback and sorry for being slow to reply here.

That was also my feeling that restoring a transaction with a prev_txn not containing the referenced oid is not supposed to happen and could be changed into a log, warning or error. I studied this more and it's a bit more complex.

"simple" patch to change to an error ```diff diff --git a/src/ZODB/FileStorage/FileStorage.py b/src/ZODB/FileStorage/FileStorage.py index 5c76000a..869ae8ea 100644 --- a/src/ZODB/FileStorage/FileStorage.py +++ b/src/ZODB/FileStorage/FileStorage.py @@ -733,6 +733,11 @@ class FileStorage( prev_txn_pos = self._txn_find(prev_txn, 0) if prev_txn_pos: prev_pos = self._data_find(prev_txn_pos, oid, data) + if not prev_pos: + panic( + "oid %s not found in prev_txn %s", + utils.oid_repr(oid), + utils.tid_repr(prev_txn)) old = self._index_get(oid, 0) # Calculate the file position in the temporary file here = self._pos + self._tfile.tell() + self._thl diff --git a/src/ZODB/tests/testFileStorage.py b/src/ZODB/tests/testFileStorage.py index 5f545e52..fb21b6d4 100644 --- a/src/ZODB/tests/testFileStorage.py +++ b/src/ZODB/tests/testFileStorage.py @@ -260,6 +260,7 @@ class FileStorageTests( self.assertEqual(self._storage.load(oid)[0], b'data1b') def testRestorePrevTxnWithoutOid(self): + from ZODB.FileStorage.FileStorage import CorruptedTransactionError t = TransactionMetaData() self._storage.tpc_begin(t) self._storage.store(b'\1' * 8, b'\0' * 8, b'data1', b'', t) @@ -274,14 +275,13 @@ class FileStorageTests( self._storage.tpc_vote(t) self._storage.tpc_finish(t) - # restore with a transaction not containing the oid + # restore with a transaction not containing the oid is an error t = TransactionMetaData() self._storage.tpc_begin(t) - self._storage.restore(b'\2' * 8, b'\0' * 8, b'data2b', b'', tid1, t) - self._storage.tpc_vote(t) - self._storage.tpc_finish(t) - self.assertEqual(self._storage.load(b'\1' * 8)[0], b'data1') - self.assertEqual(self._storage.load(b'\2' * 8)[0], b'data2b') + with self.assertRaisesRegex( + CorruptedTransactionError, + 'oid 0x0202020202020202 not found in prev_txn.*'): + self._storage.restore(b'\2' * 8, b'\0' * 8, b'data2b', b'', tid1, t) def testCorruptionInPack(self): # This sets up a corrupt .fs file, with a redundant transaction ```

I tried to change into an error and there is a legitimate case here in https://github.com/zopefoundation/ZODB/blob/6f8878124fa8bf1cc1932a8f67da8d15b653897c/src/ZODB/tests/RecoveryStorage.py#L41-L69

If this becomes an error, that test fails with an error like this:

Error in test testRestoreAcrossPack (ZODB.tests.testFileStorage.FileStorageHexRecoveryTest)
Traceback (most recent call last):
  File "/usr/local/python/3.10.13/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/python/3.10.13/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/python/3.10.13/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/workspaces/ZODB/src/ZODB/tests/RecoveryStorage.py", line 66, in testRestoreAcrossPack
    self._dst.restore(r.oid, r.tid, r.data, '', r.data_txn,
  File "/workspaces/ZODB/src/ZODB/tests/hexstorage.py", line 82, in restore
    return self.base.restore(
  File "/workspaces/ZODB/src/ZODB/FileStorage/FileStorage.py", line 738, in restore
    panic(
  File "/workspaces/ZODB/src/ZODB/FileStorage/FileStorage.py", line 93, in panic
    raise CorruptedTransactionError(message % data)
ZODB.FileStorage.FileStorage.CorruptedTransactionError: oid 0x00 not found in prev_txn 0x03f6d00826efb611

What this test does is:

and this fails, because the prev_txn in "Dest" exists, but is a "p" (pack) transaction without records.

zodb dump Source.fs ```console > zodb dump /tmp/FileStorageTestsao2yt_a3/FileStorageTestsy80r5b7k/FileStorageHexTestsemjrpbn0/FileStorageHexTestsyucvq6sd/FileStorageRecoveryTest9ur7kq7d/Source.fs Ignoring index for /tmp/FileStorageTestsao2yt_a3/FileStorageTestsy80r5b7k/FileStorageHexTestsemjrpbn0/FileStorageHexTestsyucvq6sd/FileStorageRecoveryTest9ur7kq7d/Source.fs txn 03f6d00826efb611 " " user "" description "initial database creation" extension "" obj 0000000000000000 64 sha1:d4b0bf30c5132b1ec57da7fe61b080970eea3418 �cpersistent.mapping PersistentMapping q.�}qXdataq}qs. txn 03f6d00826f06aaa " " user "" description "" extension "" obj 0000000000000000 118 sha1:bd566fcc3ce469e0d6590f4b69d45e2d3708b4fa �cpersistent.mapping PersistentMapping q.�}qXdataq}qXobj1qqcZODB.tests.MinPO MinPO q�qQss. obj 0000000000000001 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b �cZODB.tests.MinPO MinPO q.�}qXvalueqKs. txn 03f6d00826f11488 " " user "" description "" extension "" obj 0000000000000000 148 sha1:4c39467361ff1ab627e82e14f9c4158cc36979f8 �cpersistent.mapping PersistentMapping q.�}qXdataq}q(Xobj1qqcZODB.tests.MinPO MinPO q�qQXobj2q h�q Qus. obj 0000000000000002 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b �cZODB.tests.MinPO MinPO q.�}qXvalueqKs. txn 03f6d00826f4a100 " " user "" description "undo" extension "" obj 0000000000000000 from 03f6d00826f06aaa obj 0000000000000002 delete ```
zodb dump Dest.fs ```console > zodb dump zodb dump /tmp/FileStorageTestsao2yt_a3/FileStorageTestsy80r5b7k/FileStorageHexTestsemjrpbn0/FileStorageHexTestsyucvq6sd/FileStorageRecoveryTest9ur7kq7d/Dest.fs txn 03f6d00826f06aaa "p" user "" description "" extension "" obj 0000000000000001 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b �cZODB.tests.MinPO MinPO q.�}qXvalueqKs. txn 03f6d00826f11488 "p" user "" description "" extension "" obj 0000000000000000 148 sha1:4c39467361ff1ab627e82e14f9c4158cc36979f8 �cpersistent.mapping PersistentMapping q.�}qXdataq}q(Xobj1qqcZODB.tests.MinPO MinPO q�qQXobj2q h�q Qus. obj 0000000000000002 50 sha1:c69822b1b2dc00164c3db2db7d1013dcf706304b �cZODB.tests.MinPO MinPO q.�}qXvalueqKs. ```

I honestly don't fully understand what the test is doing with "copy the final transaction manually" and not using a higher level API, but this sounds like "this is more complex than just raising". Maybe this could be made a little bit more clever, tolerating the case where oid is not found because the transaction was packed, but for now, I believe we can merge this without trying to change to an error (or log or warning) - and then https://github.com/zopefoundation/ZODB/pull/395 .

Do you agree with this ? I will use "request review" and if you agree I'll apply the commits.

Thanks again for feedback.

( edit, I fixed the traceback, I used an old traceback different from the zodb dumps ) ( edit again: I feel I have misunderstood what's happening in this test, but "it's not so easy to just raise, let's handle it separately and merge this already" remains )

d-maurer commented 8 months ago

Jérome Perrin wrote at 2024-2-26 22:36 -0800:

... I believe we can merge this without trying to change to an error (or log or warning) - and then https://github.com/zopefoundation/ZODB/pull/395 .

I agree.

navytux commented 8 months ago

Jérome, Dieter, thanks for feedback.

Personally I would still carefully reconsider turning restore inconsistencies into an error, but if you both think it is ok to go as it is please go ahead. I will be mostly away for the next 3 weeks and do not want to block progress. If you both think the patch is ok in its current form I trust you to make reasonable judgement even though personally I would try to do it differently.

Kirill

d-maurer commented 8 months ago

Kirill Smelkov wrote at 2024-3-4 00:25 -0800:

Personally I would still carefully reconsider turning restore inconsistencies into an error, but if you both think it is ok to go as it is please go ahead. I will be mostly away for the next 3 weeks and do not want to block progress. If you both think the patch is ok in its current form I trust you to make reasonable judgement even though personally I would try to do it differently.

As you know, I am less interested in potential problems with low probability (than you). In may well be that a restore inconsistency poses problems within some decades; I would then act -- with a concrete case.

Unless you come with such a concrete case, I would not change the current behavior - which apparently is check by a test case (and thus likely carefully considered).

navytux commented 8 months ago

Dieter, thanks for feedback.

The difference in between us is how we perceive importance of the bug here. For me a "data corruption" is the worst that could happen to a database. For an external observer data corruptions are usually hard to detect at the time when they happen, but they introduce a real change to the data. Some of such changes could be innocent, for example if we count a volume of some gas and it will be 17.3 litres instead of 17.2 that's quite tolerable and ok. Some of the changes are not so ok, for example if it will be 123 EUR in total after an exchange transaction instead of correct value of 321000 EUR. And some of the changes could really impact lives: for example if it will be the dose of radiation to emit, and due to corruption it will be 1000 Rentgens instead of 0.1 Rentgen, it makes a huge impact of the person in subject. There is no way to save life of that person with post-mortem analysis.

So when it comes to databases, I try to make dedicated attention to correctness first.

Still, as I said above, if both you and Jérome think it is ok to go as it is now, I trust you to make a reasonable choice and judgement.

Kirill

d-maurer commented 8 months ago

Kirill Smelkov wrote at 2024-3-4 01:21 -0800:

... So when it comes to databases, I try to make dedicated attention to correctness first.

Still, as I said above, if both you and Jérome think it is ok to go as it is now, I trust you to make a reasonable choice and judgement.

I read: you will come again in a few weeks with a concrete case to reproduce the restore inconsistency. Then, we will look again (in a separate PR).

perrinjerome commented 8 months ago

Initially, this pull request was about extracting the +8 fix from https://github.com/zopefoundation/ZODB/pull/395 . I added testRestorePrevTxnWithoutOid , which is a scenario where the +8 bug would appear (because I don't like changing code without having a test corresponding to the change) but what does not seems good now is that this test would define the behavior for a case where the behavior was not clearly defined: what should happen when restore is called with a prev_txn which does not contain the oid.

Before all this, it is an undefined behavior, the data of the next transaction is read as a data header, which very likely causes an unexpected error to be raised. If the transaction is that last in the file, there is no error.

In the current version of the patch, it is tolerated and ignored.

From the discussions, @navytux suggested that this should cause a proper error.

I tried to change into an error, this caused a test to fail. The test is using low level APIs, so maybe it's just the test doing something wrong, I did not try to understand more.

Now I don't feel it would be good to merge testRestorePrevTxnWithoutOid as it is, because this would somehow be a specification of how FileStorage should behave when restore is called with such a transaction, but at this point I feel we don't really know what should be the behavior. The next step for this seems to be studying that failing test. I don't really have time to start doing this now.

https://github.com/zopefoundation/ZODB/pull/395 can be merged without all this, everybody approved so let's merge https://github.com/zopefoundation/ZODB/pull/395 already.


There was another commit in this pull request 7bef1b6e0ddc7bdfe4c1e680d099e8203f95956f , I think there was no controversy for that one.