prevent accidental changes to storage structs

diffs like the following can be extremely tricky/insidious in the context of a substrate-based blockchain:

+    /// Data structure for Axon information.
     #[derive(Encode, Decode, Default, TypeInfo, Clone, PartialEq, Eq, Debug)]
     pub struct AxonInfo {
-        pub block: u64,       // --- Axon serving block.
-        pub version: u32,     // --- Axon version
-        pub ip: u128,         // --- Axon u128 encoded ip address of type v6 or v4.
-        pub port: u16,        // --- Axon u16 encoded port.
-        pub ip_type: u8,      // --- Axon ip type, 4 for ipv4 and 6 for ipv6.
-        pub protocol: u8,     // --- Axon protocol. TCP, UDP, other.
-        pub placeholder1: u8, // --- Axon proto placeholder 1.
-        pub placeholder2: u8, // --- Axon proto placeholder 2.
-    }
-
-    // --- Struct for Prometheus.
+       ///  Axon serving block.
+       pub block: u64,
+       ///  Axon version
+       pub version: u32,
+       ///  Axon u128 encoded ip address of type v6 or v4.
+       pub port: u16,
+       ///  Axon u16 encoded port.
+       pub ip: u128,
+       ///  Axon ip type, 4 for ipv4 and 6 for ipv6.
+       pub ip_type: u8,
+       ///  Axon protocol. TCP, UDP, other.
+       pub protocol: u8,
+       ///  Axon proto placeholder 1.
+       pub placeholder1: u8,
+       ///  Axon proto placeholder 2.
+       pub placeholder2: u8,
    }

In this particular case, the relative order of ip and port was quietly changed, resulting in decoding errors on the bittensor side since the order of these struct fields determines how they are laid out in memory, and thus, how they must be decoded/encoded.

Similarly, a frequent gotcha in substrate-based chains is subtle changes to storage structs without a matching migration to enact this change / storage upgrade.

These types of issues make up a sizeable percentage of critical issues that make it to testnet or beyond before they are detected, and can be extremely damaging since they often render the underlying storage field(s) undecodable or corrupted, which can have a wide variety of exotic and severe consequences.

The main problems is we want developers to be aware when they have accidentally made a non-trivial change to a storage struct. Ideally they should be prompted in some way to create a storage migration or at least be made aware if they re-order, rename, add, or remove fields on a storage struct.

Solution

#[freeze_layout("13c78c707b010724")]
#[derive(Encode, Decode, Default, TypeInfo, Clone, PartialEq, Eq, Debug)]
pub struct AxonInfo {
    ...
}

I propose a simple solution that will prevent people from accidentally making changes to storage structs. The idea is to create an attribute macro called freeze_layout that can be attached to storage structs and other change-sensitive rust items. This attribute will take a single argument which is a hash code of the non-trivial (ignoring doc comments) AST nodes that make up the item the attribute is attached to.

If the hash code is not provided: A compiler error will be displayed providing the programmer with the proper hash code which they can then paste in.

If the hash code does not match: A compiler error will be displayed informing the programmer that they have made a non-trivial/structural change to a storage struct and that they will likely need to write a migration to enact this change. The correct hash code will also be displayed.

This is based largely on my "audited" idea originally posted on the polkadot forums, though in this case it would be a lot simpler.

AC

[x] attribute macro implementation, including a syn visitor pattern that calculates the hash code for the underlying AST nodes, ignoring trivial nodes like doc comments. Should be sensitive to node order/structure. My code here is a really good starting point: https://github.com/sam0x17/audited
[x] apply freeze_layout to all storage structs in subtensor
[ ] some way of requiring freeze_layout on storage structs? clippy? quick and dirty custom visitor running in a bin target that becomes a ci job?

opentensor / subtensor

prevent accidental changes to storage structs #513

Solution

AC