opentensor / subtensor

Bittensor Blockchain Layer
The Unlicense
123 stars 122 forks source link

prevent accidental changes to storage structs #513

Open sam0x17 opened 1 month ago

sam0x17 commented 1 month ago

diffs like the following can be extremely tricky/insidious in the context of a substrate-based blockchain:

+    /// Data structure for Axon information.
     #[derive(Encode, Decode, Default, TypeInfo, Clone, PartialEq, Eq, Debug)]
     pub struct AxonInfo {
-        pub block: u64,       // --- Axon serving block.
-        pub version: u32,     // --- Axon version
-        pub ip: u128,         // --- Axon u128 encoded ip address of type v6 or v4.
-        pub port: u16,        // --- Axon u16 encoded port.
-        pub ip_type: u8,      // --- Axon ip type, 4 for ipv4 and 6 for ipv6.
-        pub protocol: u8,     // --- Axon protocol. TCP, UDP, other.
-        pub placeholder1: u8, // --- Axon proto placeholder 1.
-        pub placeholder2: u8, // --- Axon proto placeholder 2.
-    }
-
-    // --- Struct for Prometheus.
+       ///  Axon serving block.
+       pub block: u64,
+       ///  Axon version
+       pub version: u32,
+       ///  Axon u128 encoded ip address of type v6 or v4.
+       pub port: u16,
+       ///  Axon u16 encoded port.
+       pub ip: u128,
+       ///  Axon ip type, 4 for ipv4 and 6 for ipv6.
+       pub ip_type: u8,
+       ///  Axon protocol. TCP, UDP, other.
+       pub protocol: u8,
+       ///  Axon proto placeholder 1.
+       pub placeholder1: u8,
+       ///  Axon proto placeholder 2.
+       pub placeholder2: u8,
    }

In this particular case, the relative order of ip and port was quietly changed, resulting in decoding errors on the bittensor side since the order of these struct fields determines how they are laid out in memory, and thus, how they must be decoded/encoded.

Similarly, a frequent gotcha in substrate-based chains is subtle changes to storage structs without a matching migration to enact this change / storage upgrade.

These types of issues make up a sizeable percentage of critical issues that make it to testnet or beyond before they are detected, and can be extremely damaging since they often render the underlying storage field(s) undecodable or corrupted, which can have a wide variety of exotic and severe consequences.

The main problems is we want developers to be aware when they have accidentally made a non-trivial change to a storage struct. Ideally they should be prompted in some way to create a storage migration or at least be made aware if they re-order, rename, add, or remove fields on a storage struct.

Solution

#[freeze_layout("13c78c707b010724")]
#[derive(Encode, Decode, Default, TypeInfo, Clone, PartialEq, Eq, Debug)]
pub struct AxonInfo {
    ...
}

I propose a simple solution that will prevent people from accidentally making changes to storage structs. The idea is to create an attribute macro called freeze_layout that can be attached to storage structs and other change-sensitive rust items. This attribute will take a single argument which is a hash code of the non-trivial (ignoring doc comments) AST nodes that make up the item the attribute is attached to.

If the hash code is not provided: A compiler error will be displayed providing the programmer with the proper hash code which they can then paste in.

If the hash code does not match: A compiler error will be displayed informing the programmer that they have made a non-trivial/structural change to a storage struct and that they will likely need to write a migration to enact this change. The correct hash code will also be displayed.

This is based largely on my "audited" idea originally posted on the polkadot forums, though in this case it would be a lot simpler.

AC