Closed KwilLuke closed 5 months ago
Right now, the main incompatibility is that the
Schema
struct has changed between versions. This affects RLP deserialization, which means that v0.7 schemas will not be deserializable from v0.8.
I think it goes beyond serialization unfortunately. Execution of a deploy tx has a different outcome, both in terms of state modification and how engine handles it. Branched logic would need to exist in several places I suspect. I'd have to look into it, but I'm not sure about if/how the commit ID could be kept the same for older deployment txns if the resulting postgres schema for a data set is any different. Maybe it would be the same if the engine were able to handle the old types.
@jchappelow got it (I think). I guess the main thing we need to decide is if it makes sense to even worry about this... and then we can assess the scope. @brennanjl any thoughts? Maybe we confirm the Fractal/Truflation setup?
The burden of supporting old blocks is high in this case. Particularly since we don't have the hardfork system that is needed to deal with this smoothly in place, it's probably best to have 0.8 be incompatible with 0.7. This is not the way going forward however. If we change our stance on this and decide to have migration tools, OK, but that's barely viable for a long lived blockchain use case.
Just to confirm this is not just speculation, if you attempt to sync with the staging network, which is still compatible with 0.7, you get failed to execute transaction {"error": "rlp: input list has too many elements for transactions.ExtensionConfig, decoding into (transactions.Schema).Extensions[0].Initialization[0]
on the first deply, around block 41613:
2024-04-23T12:15:21.356-05:00 info kwild.pg pg/repl.go:244 Commit hash e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, seq 41612, LSN AC/BA60B908 (741861275912) delta 632
2024-04-23T12:15:21.36-05:00 info kwild.cometbft consensus/replay.go:495 Applying block {"module": "consensus", "height": 41613}
2024-04-23T12:15:21.362-05:00 warn kwild.abci abci/abci.go:278 failed to execute transaction {"error": "rlp: input list has too many elements for transactions.ExtensionConfig, decoding into (transactions.Schema).Extensions[0].Initialization[0]"}
2024-04-23T12:15:21.365-05:00 info kwild.pg pg/repl.go:244 Commit hash e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, seq 41613, LSN AC/BA60C060 (741861277792) delta 848
2024-04-23T12:15:21.369-05:00 info kwild.cometbft consensus/replay.go:495 Applying block {"module": "consensus", "height": 41614}
2024-04-23T12:15:21.372-05:00 warn kwild.abci abci/abci.go:278 failed to execute transaction {"error": "dataset not found"}
2024-04-23T12:15:21.372-05:00 warn kwild.abci abci/abci.go:278 failed to execute transaction {"error": "rlp: input list has too many elements for transactions.ExtensionConfig, decoding into (transactions.Schema).Extensions[0].Initialization[0]"}
2024-04-23T12:15:21.374-05:00 info kwild.pg pg/repl.go:244 Commit hash e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, seq 41614, LSN AC/BA60C668 (741861279336) delta 832
2024-04-23T12:15:21.379-05:00 info kwild.cometbft consensus/replay.go:495 Applying block {"module": "consensus", "height": 41615}
2024-04-23T12:15:21.38-05:00 info kwild server/build.go:295 closing signing store
2024-04-23T12:15:21.38-05:00 info kwild.private-validator-signature-store badger/db.go:70 closing KV store
2024-04-23T12:15:21.38-05:00 info kwild.private-validator-signature-store badger/db.go:233 Lifetime L0 stalled for: 0s
2024-04-23T12:15:21.381-05:00 info kwild.private-validator-signature-store badger/db.go:233
Level 0 [ ]: NumTables: 00. Size: 0 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done
2024-04-23T12:15:21.392-05:00 info kwild server/build.go:295 closing event store
2024-04-23T12:15:21.392-05:00 info kwild server/build.go:295 closing main DB
Error: panic while building kwild: block.AppHash does not match AppHash after replay. Got 55A82A2668E79A7257FE8B1274F2209FAF91C43289E11350F39AF69C60E50986, expected 24C8C90BC018E1F8C6C51868FAA3287A09244C064BB9205543D57E2448F3BAEB.
Block: Block{
Header{
Version: {11 0}
ChainID: kwil-chain-9
Height: 41615
Time: 2024-03-14 16:11:26.451353814 +0000 UTC
LastBlockID: 8FBCFCC3E506E234527604ECEDCD5FF664C5855B8C3FF5A026C6EC70AFC71A77:1:AE0B3A4291B8
LastCommit: CDD036730934458F4D353FCF403B062A9FFEEDA759235A8FB87549DA3804D805
Data: DEEA1032B13555DE234761E84514589C2C060CB0566A59C9DB8BAE46F85AC200
Validators: 70ED3323943534C8D1DD8359DCE19880CFBA1563AA3801867A7FA81E2230BFB7
NextValidators: 70ED3323943534C8D1DD8359DCE19880CFBA1563AA3801867A7FA81E2230BFB7
App: 24C8C90BC018E1F8C6C51868FAA3287A09244C064BB9205543D57E2448F3BAEB
Consensus: 7860905924B013DCAFC4CC660E4BAE732F4923F113F5C925D56D54AF5EB2AC6F
Results: 0A01D2CD62B6525BB0298A20605C4F50A0C2C22F892BC343F6F429C459B24E2F
Evidence: E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855
Proposer: 643AF2A0462CEA175047C4B8A18817DD9429BB0B
}#83C40B66D4620E329D26B273E43204EDAE55B7BAB14C18F90E0B0EDB4C7D2646
Data{
C59F968497BA618D203A40DF21CDA7953B37108E0B7F65AEBD77894264E4828D (215 bytes)
2CBEAB6DB0B5152D4B47E5DD478FC1475E192875BBE766E921E98ED5AABAB069 (9430 bytes)
52974FC64292F0B48E8F46AE848341C32F895367C2B4A1E689A437049B6A317F (9061 bytes)
6D68CF9A08820BE82151CC222F7220249E0B169759BD0AC045F90C7D6CF4953A (7784 bytes)
}#DEEA1032B13555DE234761E84514589C2C060CB0566A59C9DB8BAE46F85AC200
EvidenceData{
}#E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855
Commit{
Height: 41614
Round: 0
BlockID: 8FBCFCC3E506E234527604ECEDCD5FF664C5855B8C3FF5A026C6EC70AFC71A77:1:AE0B3A4291B8
Signatures:
CommitSig{131D794D126B by 643AF2A0462C on 2 @ 2024-03-14T16:11:26.451353814Z}
}#CDD036730934458F4D353FCF403B062A9FFEEDA759235A8FB87549DA3804D805
}#83C40B66D4620E329D26B273E43204EDAE55B7BAB14C18F90E0B0EDB4C7D2646
Just execution of existing transactions. So this is where you would have a coordinated hardfork. Nodes would always know when to execute with rules v0 or rules v1.
Still spending more time on it, but yeah this really is a pretty tough dilemma. A few things I'm sort've spinning on:
I'm going to spend more time today on this, but this is just what I'm trying to keep in mind.
2. We think of Kwil as a blockchain much more than our users do. Our users primarily think of it as a database, and savvy users know that it just so happens to be running a blockchain. I'm not sure if this is an important consideration, but it's worth at least acknowledging. Basic product requirements commonly expected of blockchains are not necessarily held by our users (public/private data is a good example here). I'm still not sure how this plays into upgradeability and forks, it's simply a difference I've noticed in how our team discusses problems versus discussions I have with Julio, Paulo, Ryan, Raffael, etc..
For the sake of argument, while applications or operators may be ambivalent or even ignorant to the block chain aspect, we're are burdened with the qualities of a blockchain regardless. Namely, it makes upgrading difficult or impossible unless we jump through some serious hoops to enable it. We want networks to update, and presumably those networks would like to have the new feature set, so we should clearly keep the barrier low.
Say we make changes to support schema migration or other major governance-based improvements, and maybe that involves new types of transactions or serialization changes or just different logic wherever, it would ideally not require resetting a network to genesis, particularly since defining or distributing genesis data (or network migration tools) is not ready or straightforward.
IMO, the sooner we can shift our development paradigm to supporting indefinite network life, the better, but I full agree this is a ton of baggage in the case of the introduction of procedures. The baggage is especially hard to justify maintaining given 0.7 was breaking with the introduction of PostgreSQL and that it has not been widely deployed yet.
Anyway, it sounds like we need to revive the network migration task. Either the snapshot work can be a basis for creating the genesis data, or some other tools developed to do it via txns. I'm not sure what alternative txn-based tools could be developed to emulate the rebuild with transactions unless the deployed schemas were guaranteed to have a simple insert method.
Have you guys taken a look at Cosmovisor? https://docs.cosmos.network/main/build/tooling/cosmovisor
It seems like Cosmos's way of handling this is literally running a new binary and switching to it at a specific height. This seems sort've messy, since you cannot sync state without running Cosmovisor, but interesting nonetheless.
IMO, the sooner we can shift our development paradigm to supporting indefinite network life, the better
This is still very much not a fully formed thought, but I'm curious if indefinite network life is altogether something our users would care about deeply. Obviously making them manually resubmit all transactions on upgrade is not an option, but if there was a compromise that led to a literal network reset, but it happened automatically for the user, I think this could potentially be something that is "not ok for a blockchain, but ok for a database".
Still not something I am even fully sold on, just a thought.
I think we are committed to genesis data a.k.a. network migrations for v0.8. We'll break consensus more freely between release so we are free to make progress without creating hoops and baggage.
However, we will put in place the machinery to implement coordinated changes to consensus rules so that we are free to make fixes that would otherwise break a network if the rule change did not take place at a specific height. The main purpose of that is patch releases that need to break consensus, but if we happen to avoid large breaking changes between "major" releases (for us, like v0.8 -> v0.9) we can use it there too.
Closed in #782
From our conversation this morning, I am creating this issue here so we can keep tracking it.
We need to decide if we are going to focus on making Kwil v0.8 "directly upgradable" from Kwil v0.7.
Right now, the main incompatibility is that the
Schema
struct has changed between versions. This affects RLP deserialization, which means that v0.7 schemas will not be deserializable from v0.8.What we need to decide:
Creating this issue here so we can discuss and decide if we should close.