penumbra-zone / penumbra

Penumbra is a fully private proof-of-stake network and decentralized exchange for the Cosmos ecosystem.
https://penumbra.zone
Apache License 2.0
381 stars 296 forks source link

Release Testnet 76, via chain upgrade #4402

Closed conorsch closed 5 months ago

conorsch commented 6 months ago

Testnet upgrade

Testnet chain id: penumbra-testnet-deimos-8 Release date: 2024-05-22 Testnet release manager: @conorsch

We're preparing another chain upgrade, explicitly to exercise the mechanics of migrations and coordination, and implicitly to ship a few changes.

Testnet Release Manager Checklist

Pre-release:

On release day:

Post-release cleanup tasks

conorsch commented 6 months ago

Lower voting proposal period 24h -> 4h

Submitted today:

❯ pcli q governance proposal 0 definition
title = "lower proposal voting duration to 4h"
description = "enabling faster voting in support of upgrade testing in coming weeks"

[[parameterChange.changes]]
component = "governanceParams"
key = "proposalVotingBlocks"
value = "\"2880\""
hdevalence commented 5 months ago

We should be sure to pull in a current snapshot of minifront cc @grod220 @turbocrime

hdevalence commented 5 months ago

We cannot do this until the proto messages erroneously added as part of #4391 are removed.

conorsch commented 5 months ago

Prepare upgrade-plan governance proposal

❯ pcli q governance proposal 4 definition
id = "4"
title = "upgrade to 0.76.0"
description = "planned upgrade, via chain migration, to testnet 76"

[upgradePlan]
height = "222200"

❯ pcli q governance proposal 4 period
{
  "voting_start_block": 221398,
  "voting_end_block": 222123
}
conorsch commented 5 months ago
❯ date -u
Fri May 24 10:14:26 PM UTC 2024

❯ pcli q governance proposal 4 state
{
  "finished": {
    "outcome": {
      "passed": {}
    }
  }
}
conorsch commented 5 months ago

Perform Hermes Confirm IBC channels are working

@avahowell performed the hermes maintenance and confirmed working:

Worth noting that the long migration time was concerning because we need to migrate the chain within the trusting period, which is 2h. We also overlooked updating the Hermes build deps for Penumbra v0.76.0, so had to rebuild.

conorsch commented 5 months ago

Notes from release process: used a fully-scripted approach to apply the chain upgrades this time, to reduce chance of operator error. The command I ran was:

cd deployments/
HELM_RELEASE=penumbra-testnet TO_VERSION=v0.76.0 ./scripts/k8s-perform-

That process worked well, but was pretty slow: the script is conservative, and spent most of its run time creating backups and tar-ing up post-migration state. Testing on devnets, with minimial chain state on the order of a few hundred blocks, the script's run time was ~5m. On the actual testnet with ~200k blocks, the script's run time was 43m21s. Notably the script doesn't parallize any of the upgrades, but intentionally serializes them and bails out if any fails. Not bothering to optimize that logic now, but recording these hot takes while the info is fresh in my mind.

In the future, once we're sure the logic in the scripted approach is sound, parallelization alone would get us nearly a 10x speedup: we've got 2 vals, 3 nodes backing the RPC, 1 seed node, and 3 more solo fullnodes backing the various UI frontends (block-explorer, dex-explorer, and gov-dash, the latter unused).

Also, it's worth circling back on the disk usage of the multiple backups and state archives. If left unaddressed, those will stick around until the next chain migration, at which point they'll be clobbered. Worth considering because it means the provisioned storage for each node is now consumed by a lot more data than just the live chain state.

conorsch commented 5 months ago

Bump grpcui version for v1 reflection compatibility

This is done, the new v1 reflection APIs are live on https://grpcui.testnet.penumbra.zone

conorsch commented 5 months ago

Leaving galileo off, since it's failing to send txs:

May 25 00:07:09.669 ERROR galileo::responder: Failed to send funds addr=penumbra1a3afgaqz86rmh4f7p6szaygwlskwv2zur4qt2kr2z834uvjdavu9fw3sc3wjkhkf7mzvnmjwxh56t8ga98der33zyyeeqqkev8ele08j65zrg0nqf0fqvvmxhvf0af4pnjlgvu e=status: Unavailable, message: "error getting app params: missing fmd_meta_params", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

will circle back to it.

conorsch commented 5 months ago

Testnet 76 has shipped, via chain upgrade, and all the follow-up post-release tasks are complete.