Closed cratelyn closed 6 months ago
update 2024-01-09: in #3592 and in some direct conversations, we discussed the desire to address #3588 using two crates: (1) a core library that provides the ability to generate block headers, and (2) an in-memory consensus engine with facilities to manipulate time.
after some further reading, i believe it makes sense to keep #3597 focused on (1), rather than attempting to completely address #3588 in one PR. this core library should be able to generate valid block headers that can be successfully parsed by a real light client library. (tendermint-light-client-verifier
seems promising for this)
today i spent time reading through the broad strokes of how the protobuf types and .proto
files are used to generate Rust types that can be (de)serialized and used by core application logic. i also spent some time today learning more about how the cnidarium
crate's StateRead
and StateWrite
traits are extended by the penumbra-chain
and penumbra-ibc
libraries in crates/core/component
. the latter part led me to some the message handling code, which interacts with cometbft/tendermint headers.
at this point i'm still getting a lay of the land, but i feel like i am gaining a much clearer picture of what work is to come. tomorrow i'll meet up with @avahowell to answer questions i've accrued concerning IBC, and spend some time working on stubs to generate a valid tendermint block header.
Header
structure: https://github.com/penumbra-zone/penumbra/blob/29b7cb2afde3a664bcbcbbe33e544b43a802ee83/crates/proto/src/gen/tendermint.types.rs#L104update 2024-01-22:
today i continued to push pr #3597 along. i am at the point where i can almost feed output into the
tendermint-light-client-verifier
.as of now, i have a lightweight validator system defined, defaulting to a single validator. i have my consensus state defined, modeled after (some of) comet's own internal consensus state.
there is still much more to do, but it's tremendously exciting to see this library progressing past the embryonic stages. h/t to conor for showing me some relevant plumbing in the pd daemon today as well.
that's all from me for this week, i will see you all on monday 💐
- me, #status channel in discord
with my draft pr (#3597) further along, i am going to pivot focus to work on #3627 this week. i'll resume work on this issue afterwards.
today @hdevalence linked me to https://github.com/penumbra-zone/penumbra/issues/1664. this is a great issue outlining the larger goals and direction of our testing facilities. linking here, for future reference.
update 2024-02-06: my focus this week has been scoping and specifying the work to be done, and filing tickets. see :point_up: #3740, #3741, #3753, #3754, #3755, #3756, #3757, #3758, #3759, #3760.
i'll be meeting with Henry tomorrow to talk more about this, and hopefully pair program a bit.
I think it would be good to focus this effort on the concrete goal of writing App
-wide tests, which is the purpose of building the mock engine. We already have two of these, which were added in the original testing push in #1664 / #1675:
app/src/tests/spend.rs
app/src/tests/swap_and_swap_claim.rs
We should start by writing a second version of each of those tests, but instead of driving each component's code manually in the #[test]
function, we'd just instruct the mock engine to do high-level steps. This will ensure we're not overbuilding the test infrastructure relative to what we need.
Let's look at the first, simpler test, in spend.rs
. In what follows I'll make up some stub API along the way, but this is just to convey a general shape, I'm not opinionated about the specifics. I'll also take the liberty of slightly rearranging the blocks of code in the existing test to fit a narrative order.
#[tokio::test]
async fn spend_happy_path() -> anyhow::Result<()> {
let mut rng = rand_chacha::ChaChaRng::seed_from_u64(1312);
let storage = TempStorage::new().await?.apply_default_genesis().await?;
let mut state = Arc::new(StateDelta::new(storage.latest_snapshot()));
let height = 1;
This is initializing a new TempStorage
with a default genesis data. Instead, this should look something like this:
#[tokio::test]
async fn spend_happy_path_2() -> anyhow::Result<()> {
let mut rng = rand_chacha::ChaChaRng::seed_from_u64(1312);
let storage = TempStorage::new().await?;
let app = App::new(storage.latest_snapshot());
let engine = MockComet::builder()
.single_validator() // builder-style methods for configuring validators
.app_state(genesis::AppState::default()) // N.B. will later need an ext trait
.init_chain(app) // "finish" method of builder, saves app and calls InitChain
.await?;
As before, we create a TempStorage
for an RAII-guarded temporary storage that only lives as long as the test. However, instead of manually writing in the genesis state, we first construct a complete App
instance. Then, we instantiate a MockComet
and use a builder API to configure it, before passing in the app to call init_chain
.
The note about the extension trait is that we'll later want to have an extension trait for the builder API that does Penumbra-specific behavior, because the Penumbra application needs to know about its validators at genesis, so we'll eventually want to have a method that can "reconcile" a default genesis state with whatever validators the MockComet
was configured with. But I think this is not particularly important to start off with, until we seek to exercise the staking component.
Let's keep going.
// ORIGINAL spend_happy_path()
// Precondition: This test uses the default genesis which has existing notes for the test keys.
let mut client = MockClient::new(test_keys::FULL_VIEWING_KEY.clone());
let sk = test_keys::SPEND_KEY.clone();
client.sync_to(0, state.deref()).await?;
let note = client.notes.values().next().unwrap().clone();
let note_commitment = note.commit();
let proof = client.sct.witness(note_commitment).unwrap();
let root = client.sct.root();
let tct_position = proof.position();
This section uses the existing MockClient
to mock a Penumbra client syncing the chain. Remember that Penumbra distinguishes between public data, kept on-chain, and user data, kept off-chain on the user device. So to perform actions, we not only need a way to mock the consensus engine, but also a way to mock a client.
The MockClient
is a minimal reference implementation of the Penumbra client logic (as compared to the entire Rust view server, which has a ton of incidental complexity), accessing the state directly rather than making network calls and only keeping data in memory.
This code would remain unchanged. If the complexity of the actions we wanted to perform in tests grew, we could either hook up the complete view server implementation, or add adapters to the MockClient
. In any case, it's not the problem we need to address right now.
// ORIGINAL spend_happy_path()
// 1. Simulate BeginBlock
let mut state_tx = state.try_begin_transaction().unwrap();
state_tx.put_block_height(height);
state_tx.put_epoch_by_height(
height,
Epoch {
index: 0,
start_height: 0,
},
);
state_tx.apply();
// 2. Create a Spend action
let spend_plan = SpendPlan::new(&mut rng, note, tct_position);
let dummy_effect_hash = [0u8; 64];
let rsk = sk.spend_auth_key().randomize(&spend_plan.randomizer);
let auth_sig = rsk.sign(&mut rng, dummy_effect_hash.as_ref());
let spend = spend_plan.spend(&test_keys::FULL_VIEWING_KEY, auth_sig, proof, root);
let transaction_context = TransactionContext {
anchor: root,
effect_hash: EffectHash(dummy_effect_hash),
};
// 3. Simulate execution of the Spend action
spend.check_stateless(transaction_context).await?;
spend.check_stateful(state.clone()).await?;
let mut state_tx = state.try_begin_transaction().unwrap();
state_tx.put_mock_source(1u8);
spend.execute(&mut state_tx).await?;
state_tx.apply();
// 4. Execute EndBlock
let height = 1;
let end_block = abci::request::EndBlock {
height: height.try_into().unwrap(),
};
ShieldedPool::end_block(&mut state, &end_block).await;
let mut state_tx = state.try_begin_transaction().unwrap();
// ... and for the App, call `finish_block` to correctly write out the SCT with the data we'll use next.
state_tx.finish_block(false).await.unwrap();
state_tx.apply();
Ok(())
In the rest of the test, the current code:
BeginBlock
simulation (not exercising the real codepaths, just manually poking at the state)Spend
action isolated from any transaction (this will never occur in reality) and does ad-hoc mocking to patch up the fact that there is no containing transactionSpend
action, isolated from any containing transactionShieldedPool
component, leaving the app in an invalid state).Instead, this should use MockComet
to execute the entire app logic, so we are testing the code paths we care about and creating complete state transitions:
// NEW spend_happy_path_2()
// 1. Create a `TransactionPlan` with a 1-spend, 1-output transaction
let plan = TransactionPlan {
// self-contained struct literal with a 1-spend, 1-output transaction
// using the data obtained from the MockClient as in the snippet above
};
let auth_data = AuthorizationData {
// we can either create a struct literal with the signatures we need,
// or we can initialize a SoftKms instance with the test keys and pass the plan
};
let witness_data = WitnessData {
// self-contained struct literal using the inclusion proof from the MockClient
};
// 2. Use the `TransactionPlan`, `AuthorizationData`, and `WitnessData` to build the transaction
let tx = plan.build_concurrent(...).await?;
// 3. Now use the `MockComet` to create a new block with that transaction
engine.block_builder()
//.timestamp(...) // methods for controlling simulated execution
.add_tx(tx.encode_to_vec()) // adds to list in block builder
.execute() // triggers app execution
.await?;
This is much simpler and less error-prone. We need to change from simulating a single Spend
action in isolation to simulating it in a complete transaction. But this is fine, because that's the actual code path we are going to be executing.
We manually plan the transaction using the data from the MockClient
, then build it. Later, if the complexity of transaction planning grows, we may want to consider adapting the Rust view server into the test framework, but this is not important to worry about now.
Then, we use a block_builder()
builder API on the MockComet
instance to create a new block and execute it against the app. I'm imagining that in general, invocations of the builder API should look like:
.timestamp
, there should be default of 5s relative offset, etc.add_tx
, appending transactions into the block;.execute()
, which builds the block as-configured and then drives execution of the application.Now, in this example, we can go even further with spend_happy_path_2()
:
// NEW spend_happy_path_2()
// 4. Use the MockClient to check that we detect our output note
client.sync_to(...)
// .... check that the note we expect is now seen in the MockClient
// ... or we could make a second transaction, checking we can spend notes that weren't created at genesis
// ... or we could check that a double-spend fails, or, ....
By contrast, what would happen if we tried to add this to the existing test? The entire test would explode, because the current test leaves the application in an invalid state, it doesn't write out the compact block, it doesn't close the SCT, etc., because it doesn't execute the full application logic.
Note also what's not required: anything other than actually driving the application and feeding it a header as part of BeginBlock
. At this point, the header doesn't even need to be valid -- that would only come into play once we try to write an IBC test (#3758).
I think it would be good to focus this effort on the concrete goal of writing
App
-wide tests, which is the purpose of building the mock engine.
first: i've renamed this issue, to be clearer about the fact that this is not intended for intgegration tests, and is intended for App
tests.
This is initializing a new TempStorage with a default genesis data. Instead, this should look something like this:
#[tokio::test]
async fn spend_happy_path_2() -> anyhow::Result<()> {
// ...
let engine = MockComet::builder()
.single_validator() // builder-style methods for configuring validators
.app_state(genesis::AppState::default()) // N.B. will later need an ext trait
.init_chain(app) // "finish" method of builder, saves app and calls InitChain
.await?;
Then, we instantiate a
MockComet
and use a builder API to configure it, before passing in the app to callinit_chain
.
this part, where we "finish" the builder, and call InitChain
was what i was intending to describe and track with #3754.
[T]he Penumbra application needs to know about its validators at genesis, so we'll eventually want to have a method that can "reconcile" a default genesis state with whatever validators the MockComet was configured with. But I think this is not particularly important to start off with, until we seek to exercise the staking component.
i am curious to talk to you more about this. i filed #3760 to track what i think this is implying: we will want to be able to supports multiple validators to exercise certain components, like the staking component you mention.
i agree that running a single node is all that is needed for an MVP, and we don't want to introduce networking complexity into tests.
i suspect i am misunderstanding something.
notes from today's call with Henry:
src/
if possible #3781pd
#3789App
so that the public methods correspond to ABCI messagesupdate 24-02-07: today we paired up and talked at length about what API this library should expose. this was tremendously productive! tomorrow morning i'll address some of the todo items listed above, and reorganize the list of issues to be completed. others will be marked out of scope.
with all this in hand, i think a lot of exciting progress will be happening very soon. :slightly_smiling_face:
crossreferencing https://github.com/penumbra-zone/tower-abci/pull/41#issuecomment-1936860930 for visibility
💭 background and motivation
currently, writing complete end-to-end integration tests for Penumbra is difficult. moreover, we don't have a better alternative strategy for testing the core
App
type.currently, the
smoke-test.sh
script performs the steps needed to run integration tests of the system, including:cometbft
processpd
processbecause these integration tests assume a test network including
pd
andcometbft
is running locally, these tests must run serially, and must spend time waiting for new blocks to be generated.additionally, this means we must under-handedly make use of the
#[ignore]
attribute to prevent these tests from running (and failing) alongside unit tests when commands likecargo test --workspace
are run.see CometMock for a mock implementation of CometBFT implemented in Go. see #2771 for previous work in this space integrating with
CometMock
.📐 overview
this state of affairs would be improved if we had a mock implementation of the consensus engine (CometBFT) that would allow
#[test]
functions to generate ABCI messages. this mock implementation should function as a drop-in replacement for CometBFT from the perspective of thepd
daemon.this mocking library should additionally expose some "meta" interfaces. to allow us to write integration tests including things like:
this will allow us to run integration tests faster, write them more easily, and generally improve our ability to make assurances about the correctness of our software.
:triangular_ruler: requirements & design guidelines
(this will go into crate-level documentation at some point)
the mock consensus engine should be application agnostic. that means it should not depend on
penumbra-*
crates. (#3810)the mock consensus engine is built to drive a consensus service
C: Service<ConsensusRequest, Response = ConsensusResponse, Error = BoxError>
:link: related work
3597
3695
3672
3789
3794
3787
3781
3799
3807
3810
3816
3902
3922
3788
3849
3857
3792
3822
3848
3995
3933
4002
3936
4004
4001
4044
4047
4048
4059
4040
4061
4070
4099
4102
4110
4133
3908
3934
3966
4005
4149
4050
4182
3937
4174
4173
4181
4184
4185
3913
3958
3973
3980
3996
:soon: future work
this is a large project, and the extension and maintenance of this library will be an ongoing project. here are things that we ought to do someday, but won't be considered requirements for this issue to be closed.
3759
3741
3758
4071
:x: out-of-scope
things we will not be doing, and other closed tickets:
3740
3753
3754
3755
3756
3757
3760
3876
3935
:eyeglasses: further reading
1664
1675