use-ink / ink

Polkadot's ink! to write smart contracts.
https://use.ink
Apache License 2.0
1.35k stars 428 forks source link

Macro based storage rework #1134

Closed athei closed 2 years ago

athei commented 2 years ago

Most of the existing storage types were removed because they are simply to inefficient in size and executed instructions. Right now we only have Mapping. However, this type doesn't work very well with the existing storage infrastructure and needs a band aid named SpreadAllocate and initialize_contract.

We want to pivot more in the direction of how storage works in FRAME. We should keep the overarching contract data structure but use the storage macro to generate the supporting code instead of a elaborate system of traits.

This would also allow us more flexibility in how to generate the storage keys for these data structures and more control over what code is generated. The aim is to reduce the contract sizes by removing monomorphization and copying of the root keys.

Please note that the following is only a sketch of the generated code. It is probably not how the generated code should look like. @Robbepop noted that we should not use private sub modules. Please feel free to post your suggestions and I will update the top post.

We add a new trait that will be implemented by the storage macro to supply our storage traits with a key. Our storage traits have default implementations for every function. The macro will only set the associated types.

trait HasKey {
    const KEY: [u8; 32];
}

trait StorageValue: HasKey {
    type Value: Encode + Decode;

    // we take self here to build in the mutability levels
    fn write(&mut self, value: &Value) {
        ink_env::set_contract_storage(Self::KEY, value.encode());
    }

    fn read(&self) -> Self::Value {
        ink_env::get_contract_storage(Self::KEY)
    }
}

trait StorageMap: HasKey {
    type Key: Hashable;
    type Value: Encode + Decode;

    fn insert(&mut self, key: Self::Key, &value: Self::Value) {
        ink_env::set_contract_storage(twoxx(Self::KEY ++ key), value.encode());
    }

    fn get(&self, key: Self::Key) -> Self::Value {
        ink_env::get_contract_storage(twoxx(Self::KEY ++ key));
    }
}

Then the storage macro creates a private module where it dumps all the generated code:

#[ink(storage)]
struct Contract {
    a: u8,
    b: Mapping<AccountId, Balance>,
}

// The macro would transfrom the struct into this:
// The underscores are just to prevent clashes with user defined modules
// Having public fields in the struct would be an error.
mod __private__ {
    pub struct AFieldStorageValue {
        /// prevent instantation outside of this generated code
        _private: ()
    }

    pub struct BFieldMapping {
        /// prevent instantation outside of this generated code
        _private: ()
    }

    impl HasKey for AFieldStorageValue {
        // hash of "a: AFieldStorageValue"
        const KEY: [u8; 32] = b"3A9ADFE4234B0475EB1767ACD1BCFA75D7B9E0BA1C427FBAF0F476D181D3A820";
    }

    impl HasKey for BFieldMapping {
        // hash of "b: BFieldMapping"
        const KEY: [u8; 32] = b"33D27C398EEEA3851B750A1152FF9B63B1D6D10BE18E2D58A745776D9F2FF241";
    }

    // the functions all have default operations. We just need to set the types.
    impl StorageValue for AFieldStorageValue {
        type Value = u32;
    }

    // the functions all have default operations. We just need to set the types.
    impl StorageMap for BFieldMapping {
        type Key = AccountId;
        type Value = Balance;
    }

    // fields are private cannot be contstructed by user
    pub struct Contract {
        a: AFieldStorageValue,
        b: BFieldMapping,
    }

    impl Contract {
        // we generate this but leave out any ink! defined collection types
        // initialization is and always be the crux of the matter. I would
        // suggest forcing the user to initialize everything here but
        // collections. So every user defined type would be wrapped into
        // a `StorageValue` and added here. The rest are ink! collections
        // which might or might not be skipped here.
        pub fn initialize(a: u32) -> Self {
            let ret = Self {
                a: AFieldStorageValue { _private: () },
                b: BFieldMapping { _private: () },
            };
            ret.a.write(a);
            ret
        }

        // we create accessors for each field with the correct mutability
        // this way we keep the natural way of marking messages as
        // mutable and getters
        pub fn a(&self) -> &AFieldStorageValue { return &self.a };
        pub fn b(&self) -> &BFieldMapping { return &self.b };
        pub fn a_mut(&mut self) -> &mut AFieldStorageValue { return &mut self.a };
        pub fn b_mut(&mut self) -> &mut BFieldMapping { return &mut self.b };
    }
}

use __private__::Contract;

impl Contract {
    // Should we have the `constructor` macro write the `Contract` on `Drop`?
    #[ink(constructor)]
    pub fn new() -> Self {
        // user cannot fuck up because it is the only constructor available
        Self::initialize(42)
    }

    #[ink(message)]
    pub fn do_writes(&mut self) {
        self.a_mut().write(1);
        self.b_mut().insert(ALICE, 187334343);
    }

    #[ink(message)]
    pub fn do_reads(&self) {
        let a = self.a().read();
        let b = self.b().get(ALICE);
    }

    #[ink(message)]
    pub fn do_reads_and_writes(&mut self) {
        // just works
        self.a_mut().write(1);
        self.b_mut().insert(ALICE, 187334343);
        let a = self.a().read();
        let b = self.b().get(ALICE);
    }
}

The biggest downside of this approach is the same as substrate's: New storage collections always need changes to be made in codegen in order to support them. However, I assume this will safe us a lot of code passing around all those keys and stateful objects (lazy data structures) at runtime. It is also much simpler and easier to reason about (because it has less features).

Open Questions

How does this affect/play with things like:

xgreenx commented 2 years ago

I played with that idea, tested different implementations and cases, checked the sizes.


First of all to reduce the size, better to avoid frequent usage of ink_env::set_contract_storage and ink_env::get_contract_storage.

Instead of:

#[ink(storage)]
pub struct Flipper {
    value: Storage<bool>,
    value2: Storage<bool>,
    value3: Storage<u32>,
    value4: Storage<u32>,
}

Better to:

#[ink(storage)]
pub struct Flipper {
    value: Storage<Test>,
}

struct Test {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

Or

#[ink(storage)]
pub struct Flipper {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

The second point - introduction of accessors like:

// we create accessors for each field with the correct mutability
// this way we keep the natural way of marking messages as
// mutable and getters
pub fn a(&self) -> &AFieldStorageValue { return &self.a };
pub fn b(&self) -> &BFieldMapping { return &self.b };
pub fn a_mut(&mut self) -> &mut AFieldStorageValue { return &mut self.a };
pub fn b_mut(&mut self) -> &mut BFieldMapping { return &mut self.b };

Will be very unclear for the user + we will have the problem with highlighting in IDE.


Based on the results I want to propose the next:

ink! will provide types and structures that allow moving some fields into their own cell. These types will have two modes:

That solution provides control over the storage keys. The developer can decide by himself that key use and where. The solution introduces only one limitation - all storage-related structures should be defined via the #[ink(storage)] macro. The main storage of the contract can be marked with #[ink(contract)] or #[ink(storage(main))] or something like that.

That solution doesn't affect multi-files because everyone will use that rules to work with storage -> structures are marked with #[ink(storage)] and implements scale::Decode + scale::Encode. It also doesn't affect traits and ideas used in OpenBrush(We only need to rewrite the storage struct according to new rules). I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.

// `value` and `value2` are `Test2`. All inner storage keys are the same -> self.value.get().value4.set(&144) will also affect self.value2.get().value4
#[ink(storage)]
pub struct Flipper {
    value: Storage<Test2, ManualKey<1>>,
    value2: Storage<Test2, ManualKey<4>>,
}

#[ink(storage)]
struct Test2 {
    value: Storage<bool>,
    value2: Storage<bool>,
    value3: Storage<u32>,
    value4: Storage<u32>,
}

That problem has a workaround with manual offset:

#[ink(storage)]
pub struct Flipper {
    value: Storage<Test2<100>, ManualKey<100>>,
    value2: Storage<Test2<200>, ManualKey<200>>,
}

#[ink(storage)]
struct Test2<const Offset: u128> {
    value: Storage<bool, ManualKey<{ Offset + 1 }>>,
    value2: Storage<bool, ManualKey<{ Offset + 2 }>>,
    value3: Storage<u32, ManualKey<{ Offset + 3 }>>,
    value4: Storage<u32, ManualKey<{ Offset + 4 }>>,
}

But then we need to return an error if the storage layout contains two same keys to prevent the user from bugs=)

Below the example of the code that I used for testing(It is only MVP). The same idea can be implemented in Mapping.

#[derive(Default)]
struct AutoKey;
#[derive(Default)]
struct ManualKey<const KEY: u128>;

#[derive(Default)]
struct Storage<T: Decode + Encode, Mode = AutoKey>(PhantomData<(T, Mode)>);

#[cfg(feature = "std")]
impl<T: Decode + Encode + scale_info::TypeInfo + 'static, Mode> ::ink_storage::traits::StorageLayout for Storage<T, Mode>
where Self: KeyTrait {
    fn layout(_key_ptr: &mut KeyPtr) -> Layout {
        Layout::Cell(CellLayout::new::<T>(LayoutKey::from(
            &<Self as KeyTrait>::key(),
        )))
    }
}

trait KeyTrait {
    fn key() -> Key;
}

trait StorageTrait: KeyTrait {
    type Value: Decode + Encode;

    fn get(&self) -> Self::Value {
        ink_env::get_contract_storage(&Self::key()).unwrap().unwrap()
    }

    fn set(&self, value: &Self::Value) {
        ink_env::set_contract_storage(&Self::key(), &value);
    }
}

impl<T: Decode + Encode, const KEY: u128> KeyTrait for Storage<T, ManualKey<KEY>> {
    #[inline]
    fn key() -> Key {
        Key::from(unsafe { core::mem::transmute::<(u128, u128), [u8; 32]>((KEY, 0)) })
    }
}

impl<T: Decode + Encode, const KEY: u128> StorageTrait for Storage<T, ManualKey<KEY>> {
    type Value = T;
}

impl<T: Decode + Encode> KeyTrait for Storage<T, AutoKey> {
    #[inline]
    fn key() -> Key {
        // Here we will put `__ink_error...`
        panic!()
    }
}

impl<T: Decode + Encode> StorageTrait for Storage<T, AutoKey> {
    type Value = T;
}

trait IsAutoKey<const KEY: u128> {
    type StorageType;
}

impl<T: Decode + Encode, const KEY: u128> IsAutoKey<KEY> for Storage<T, AutoKey> {
    type StorageType = Storage<T, ManualKey<KEY>>;
}
athei commented 2 years ago

The whole contract is stored under the storage key ContractRootKey::ROOT_KEY in one cell -> Fields don't have their own storage key by default. The contract is one monolith structure.

Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.

In your examples I don't see the overarching main contract structure that ties all of that together.

I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.

This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.

Also, would be nice to have the example with Mapping instead of Storage. The latter is too simple to understand what is happening there.

xgreenx commented 2 years ago

Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.

I didn't implement ContractRootKey::ROOT_KEY logic in the example because it is part of #[ink::contract] macro. All fields implement Decode + Encode. To load a contract we will use ink_env::get_contract_storage(&ContractRootKey::ROOT_KEY), to flush - ink_env::set_contract_storage(&ContractRootKey::ROOT_KEY, &contract.encode()).

The idea is that the developer for some type will create an empty implementation of Decode and Encode. For example Storage or Mapping. That types work with storage directly.

So all atomic types are loaded with contract. Other types doesn't require loading.

In your examples I don't see the overarching main contract structure that ties all of that together.

I will prepare more detailed example=)

This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.

Seems I find the solution for that problem. const_generics_defaults feature is stable now, and we can create strcutures like:

struct Test2<const KEY: u128 = 0> {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

So our #[ink(storage)] macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.

Also, would be nice to have the example with Mapping instead of Storage. The latter is too simple to understand what is happening there.

Will try to add an example for Mapping too. The problem that example already is big=)

xgreenx commented 2 years ago

Seems I find the solution for that problem. https://github.com/rust-lang/rust/issues/44580#issuecomment-991782799 feature is stable now, and we can create strcutures like:

struct Test2<const KEY: u128 = 0> {
   value: bool,
   value2: bool,
   value3: u32,
   value4: u32,
}

So our #[ink(storage)] macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.

No matter, we can't use that KEY to do calculations like { KEY + 2 } during the declaration. It is possible with an unstable feature #![feature(generic_const_exprs)], but we want to be stable=)

I will prepare the simple version only with a manual key and with generated.

This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.

I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.

athei commented 2 years ago

I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.

If we can catch this during compilation then it is fine. Would be better to catch it during the compilation of the wasm, though.

The idea is that the developer for some type will create an empty implementation of Decode and Encode. For example Storage or Mapping. That types work with storage directly.

So all atomic types are loaded with contract. Other types doesn't require loading.

This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap simply implement an empty Encode + Decode. Just to reiterate: What we want to achieve with this rewrite is:

I have some further questions:

xgreenx commented 2 years ago

I managed to implement auto-incrementing of the storage key during compilation(it is a hard variant where we don't need to worry about keys)=) Here is an example. It contains only an example with Storage at the moment, but the Mapping is the same in the implementation. There is ugly code, but it will be generated by the macro. In the comments, I describe how it should look like in the contract. I added Debug everywhere to output the keys, so please ignore that stuff. That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?

This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap simply implement an empty Encode + Decode. Just to reiterate: What we want to achieve with this rewrite is

Nope!=) Currently, we are storing each field under the own storage key. If we put all fields under ink_storage::Pack<T> then we will store everything under one key, but currently, it is not. I compiled contracts with ink_storage::Pack<T> and they take much less memory, so It is why I'm proposing to store all fields under the one key by default because it is the most performant strategy in terms of the contract's size.

If we generate storage keys during compilation, then the empty implementation of Encode + Decode will not affect the size of the contract.

Would it be legal to nest those storage types? For example, to allow a Storage with the value of a Mapping? I don't think this makes sense, right?

Usage like Storage<Mapping<Key, Value>> doesn't make sense. But Storage<Struct> where Struct contains many fields and some of that fields are Mapping or Storage is a valid case.

In all your examples you just a numeric key (u128). Would the same type also be used for the automatic key? I think this would be beneficial as it is much smaller as concatenating strings. Could we just use the position within the struct? Like having a counter inside the proc macro? This is how it is currently working (this counter is passed down as argument). We could even use smaller type there then (like u32).

Yeah, I want to propose using some integer as a storage key, because in terms of the contract we don't need to worry about the collision of fields. Yes, in the hard version that I posted it is auto-incremented.

How would you combine the root key of a mapping (a numeric value) with the Key of the Mapping itself? Do we require Display for the Key and just concat (remember we don't want to hash so that the key is transparent).

Storage key implements Encode and key of Mapping implements it. As we discussed with you in the chat, it can be done with a new seal_get_storage it can be done like:

let key = (&Mapping::StorageKey, &Key).encode().as_ref();
let value = value.encode().as_ref();
seal_get_storage(key, key.len(), value, value.len())

seal_get_storage will hash it by self

xgreenx commented 2 years ago

That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?

I mentioned that issue: https://github.com/rust-lang/rust/issues/26925

But seems we can handle it. The derive macro can be expanded before the #[ink(storage)] macro, so we can add the generic for numeration later and it will not have conflicts with#[derive(...)]. It works because of that change: https://github.com/rust-lang/rust/pull/79078 Our macro can check: Does exist at least one derive macro after him? if yes - our macro will put himself after the derive macro.

athei commented 2 years ago

That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?

You are meaning the one on your Test and Flipper structs? Bounds are merely a heuristic and often wrong. It can detect PhantomData but in your case the PhantomData is burried deep in other types so the macro cannot see it. This is why we have attributes in parity-scale-codec for its macros to overwrite bounds. In substrate we also have a DefaultNoBounds macro for convenience. That said, it seems you have found a better solution by changing the the order in which the macros are applied.

I looked through your prototype and I think this would a a very good design. Having a numeric counter at compile team is the best of both worlds:

This looks really promising to me. Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID) which is 36byte assuming that we use u32 for the counter. Maybe 64byte would be a sensible limit to allow for Mapping keys that are larger than an AccountId.

xgreenx commented 2 years ago

Why u128 for the key? Since it is only a counter now we can get away with u32. This would map to a primitive operation in wasm. If you want to be very careful we could also use u64 which still has a native type in wasm. But I don't think it is necessary.

I tested it with current ink's codegen, so it was simpler to transmute (u128, u128) into [u8; 32]. But yes, we can use u32 or u64 in the final solution. I think u32 should be enough to not have collisions.

Why do you implement KeyTrait for ManualTypedKey and Storage<T, ManualTypedKey> with the same implementation? Couldn't the latter just use the former?

It is caused by the Debug trait for logging=) It was a late night and I tried to finish the example faster. It is only for the println in the example.

I think the docs which describe the user definition of Test do not match the generated code: In the docs there is no usage of Storage but in the generated code there is.

Yep, you are right, I forgot to update the comment.

// The user defiend the next struct:
//
// #[derive(Default, Debug)]
// #[ink(storage)]
// struct Test {
//     value: bool,
//     value2: Storage<bool>,
//     value3: u32,
//     value4: Storage<u32>,
// }

This can't detect duplicate keys, right? I played around a bit and was able to have duplicate keys.

Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey mode, and only have Auto, ManualOffset. In that case, the code will not generate duplicate keys.

Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID) which is 36byte assuming that we use u32 for the counter. Maybe 64byte would be a sensible limit to allow for Mapping keys that are larger than an AccountId.

I think resolving that issue will close that question as we discussed in the element channel.


I also want to highlight another problem in that solution. Test is Test<()>, but Test<ManualKey<2>> is another structure. So if the user wants to work with the code that is generated for Test he should transform it from Test<ManualKey<2>> into Test<()>.

I added AtomicResolveType: If the structure is atomic(doesn't use Sotrage or Mapping) then we can use Test, else Test<SomeKey>. It helps to not generate useless code and work with Test where it is possible. But the problem still exists for structures that are complex(that contains Storage or Mapping).

We can generate the AsRef implementation, From. But it still can be unclear for the developer why he should convert Test into Test<SomeKey>.

impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsRef<Test<NewParent>> for Test<OldParent> {
    #[inline]
    fn as_ref(&self) -> &Test<NewParent> {
        unsafe { core::mem::transmute::<&Test<OldParent>, &Test<NewParent>>(self) }
    }
}
impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsMut<Test<NewParent>> for Test<OldParent> {
    #[inline]
    fn as_mut(&mut self) -> &mut Test<NewParent> {
        unsafe { core::mem::transmute::<&mut Test<OldParent>, &mut Test<NewParent>>(self) }
    }
}
athei commented 2 years ago

I think resolving https://github.com/paritytech/substrate/issues/7724 will close that question as we discussed in the element channel.

Yes. This will kill two birds with one stone.

Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey mode, and only have Auto, ManualOffset. In that case, the code will not generate duplicate keys.

If it makes things easier we should drop the manual key for now. Right now the keys are also derived from order. We can add this later. Maybe even as a post processing step on metadata to not further complicate the code (as you initially suggested).

I also want to highlight another problem in that solution. Test is Test<()>, but Test<ManualKey<2>> is another structure. So if the user wants to work with the code that is generated for Test he should transform it from Test<ManualKey<2>> into Test<()>.

Yeah this is kind of a bummer. Didn't understand the code well enough to help with that right now. But I am sure that using transmute for this isn't the right way.

cmichi commented 2 years ago

@xgreenx We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?

xgreenx commented 2 years ago

We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?

We need to decide on several questions and I think we can move forward.


  1. Do we want to hide the generic type of the NextStorageKey on the storage structure or not?

1.1. We can add a generic type during the codegen as described above. By default, it will be (). But if it is not an atomic struct, it will use the generic to have a correct storage key and the user should understand that he needs to work with the generic struct.

Pros:

Cons:

1.2. The user should explicitly specify the generic type for NextStorageKey(In the case if the struct contains several generics the NextStorageKey should be last). He doesn't need to use it in the struct, the codegen will use it.

Pros:

Cons:

The definition looks like:

#[derive(Default, Debug)]
#[ink(storage)]
struct Test<Last: NextStorageKey> {
    value: bool,
    value2: bool,
    value3: u32,
    value4: u32,
}

  1. #[ink(storage)] is a procedural macro that will generate impls for helper traits. How do we want to identify the main contract's storage?

For example, we can use #[ink::storage] to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)] or #[ink::storage(contract)].

Or you can suggest your ideas=)


  1. Maybe you have some suggestions about naming(for all stuff)=D For example, NextStorageKey is too long, maybe Storage is enough.
athei commented 2 years ago

1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?

For example, we can use #[ink::storage] to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)] or #[ink::storage(contract)].

I would suggest keeping #[ink::storage(<ROOT_KEY>)] for the main storage and using #[ink::storage_ext] for additional storage items. <ROOT_KEY> is optional and defaults to 0. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.

Maybe you have some suggestions about naming(for all stuff)=D For example, NextStorageKey is too long, maybe Storage is enough.

I agree. If this is visible to the user a simple name as Storage would go a long way in not confusing them. In general it is very worthwhile to spend a lot of time on naming. Don't glance over it. It is time well spent. That said, we can always change the names late in a review as these are simple find and replace changes.

xgreenx commented 2 years ago

1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?

I also prefer the explicit solution. Because the developer can google Rust information and find what is going on.

is there really no better way (I can't tell)

We want to know the key during the compilation time of each Storage or Mapping field -> Each Storage or Mapping field should have its own storage key.

Here we have two options:

  1. The key is calculated relatively to previous keys to create a unique key each time and avoid duplicate keys.

Previously used cells should affect the storage key of the current struct -> The struct should know about the parent -> We need to use Generic.

Pros:

Cons:

  1. The key is generated based on the NAME_OF_STRUCT::NAME_OF_FIELD.

Pros:

Cons:

But we can identify during metadata generation that someone used the struct more than one time and throw an error with description. And the user should use another struct.


I would suggest keeping #[ink::storage()] for the main storage and using #[ink::storage_ext] for additional storage items. is optional and defaults to 0. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.

I'm okay with that solution=)

athei commented 2 years ago

Can you please remind me what you mean when you say "atomic struct"?

I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.

I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping and with Storage. Then we can check whether this is actually something that is usable or to complicated for the users.

xgreenx commented 2 years ago

"atomic struct" - all fields are atomic. All types are atomic except Storage and Mapping.

I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.

We still can use u32 or u64, the chance to generate the same key is not so high=)

I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping and with Storage. Then we can check whether this is actually something that is usable or to complicated for the users.

const ROOT_KEY: u32 = 123;

#[ink(storage(ROOT_KEY))]
pub struct Flipper {
    value1: AtomicStruct,
    value2: NonAtomicStruct,
    value3: Storage<AtomicStruct>,
    // Possible
    value4: Storage<NonAtomicStruct>,
    // Possible
    value5: Mapping<u128, AtomicStruct>,
    // Compilation Error, it is not possible
    // value6: Mapping<u128, NonAtomicStruct>,
}

impl Flipper {
    fn get_value1(&self) -> &AtomicStruct {
        &self.value1
    }

    fn get_value2<T: Storage>(&self) -> &NonAtomicStruct<T> {
        &self.value2
    }

    fn get_value3<T: Storage>(&self) -> &Storage<AtomicStruct, T> {
        &self.value3
    }

    fn get_value4<T1: Storage, T2: Storage>(&self) -> &Storage<NonAtomicStruct<T2>, T1> {
        &self.value4
    }

    fn get_value5<T: Storage>(&self) -> &Mapping<u128, AtomicStruct, T> {
        &self.value5
    }
}

#[ink::storage_ext]
pub struct AtomicStruct<Last: Storage = ()> {
    value1: bool,
    value2: u32,
    value3: Vec<u32>,
    value4: String,
}

#[ink::storage_ext]
pub struct NonAtomicStruct<Last: Storage = ()> {
    value1: Mapping<u32, bool>,
    value2: Storage<Vec<u32>>,
}

impl<Last: Storage> NonAtomicStruct<Last> {
    fn get_value1<T: Storage>(&self) -> &Mapping<u32, bool, T> {
        &self.value1
    }

    fn get_value2<T: Storage>(&self) -> &Storage<Vec<u32>, T> {
        &self.value2
    }
}
athei commented 2 years ago

We still can use u32 or u64, the chance to generate the same key is not so high=)

I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.

I think with a numeric solution we could even get away with u8. We would make the Key generic and just use u8 by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.

Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.

Just one question: Why is the type parameter on the storage structs called Last? It seems like a strange name.

xgreenx commented 2 years ago

I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.

Yes, we can do the same as for selectors, we will truncate it to u32 or u64.

I think with a numeric solution we could even get away with u8. We would make the Key generic and just use u8 by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.

In WASM it still is u32, I think it is better to keep u32.

Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.

Okay=) It looks normal to me. I only worry that someone from Solidity can be afraid of it=)

Just one question: Why is the type parameter on the storage structs called Last? It seems like a strange name.

Because actually, we will pass the last data type(that type took the previous cell) that will be used to get the next storage key. We can use Parent but it is not actually parent=) We can simply call it Key: Storage or S: Storage. I'm not master of the naming=(

athei commented 2 years ago

In WASM it still is u32, I think it is better to keep u32.

I think those constants will go into the wasm data section and can be packed densely there. Of course they will be loaded into an i32 when they are used but we still save the space in the data section. We would need to verify that, though.

We can use Parent but it is not actually parent=) We can simply call it Key: Storage or S: Storage. I'm not master of the word=(

Some renaming suggestions:

Do you have a preference regarding numeric keys vs truncated 64bit hashes? My thoughts on truncated hashes:

Con:

Pro:

Let's do some napkin math regarding the size benefits of numeric keys. Let's say numeric keys are 1byte in contract size (stored in data section which is linear memory) and truncated keys are 8byte. So we look at 7byte overhead for each field.

How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.

Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.

xgreenx commented 2 years ago

Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.

I also prefer the version with truncated hash, because It will be simpler for me as for a smart contract developer. I don't like to put generics everywhere in cases where I plan to use only atomic structures(or not atomic struct but one time).

How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.

Yea, the overhead seems not too high + it requires investigation regarding 1byte=)

Given that we use hash(CONTRACT_ROOTKEY ++ STRUCT_NAME ++ FIELD_NAME): We cannot use the same non atomic struct twice. We would need an attribute to override the STRUCT_NAME it in that case.

If someone wants to use some struct twice or more, he can specify the generic by himself.

/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>(PhantomData<(Key, XOR)>);

const ROOT_KEY: u64 = 123;

#[ink(storage(ROOT_KEY))]
pub struct Flipper {
    value: TestTwice<hash_u64!("Flipper::value")>,
    value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
    value3: TestOnce,
}

#[ink::storage_ext]
struct TestTwice<const Key: u64> {
    value: StoredValue<bool, ManualKey<hash_u64!("Test::value"), Key>>,
    value2: StoredValue<bool, ManualKey<hash_u64!("Test::value2"), Key>>,
    value3: StoredValue<u32, ManualKey<hash_u64!("Test::value3"), Key>>,
    value4: StoredValue<u32, ManualKey<hash_u64!("Test::value4"), Key>>,
}

#[ink::storage_ext]
struct TestOnce {
    value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"),  42>>,
    value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
    value3: StoredValue<u32>,
    value4: StoredValue<u32>,
}
athei commented 2 years ago

Having to manually key every field in TestTwice seems a bit extreme just so we can use it twice. Couldn't the supplied const Key be picked up by the automatic key generation? The macro should be able to detect that there is a const Key specified and use it.

xgreenx commented 2 years ago

We can add XOR part to AutoKey:

/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>;
/// The auto-generated storage key(based on the name of the stuct and field) will be XORed with `XOR` value
struct AutoKey<const XOR: u64 = 0>;

const ROOT_KEY: u64 = 123;

#[ink(storage(ROOT_KEY))]
pub struct Flipper {
    value: TestTwice<hash_u64!("Flipper::value")>,
    value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
    value3: TestOnce,
}

#[ink::storage_ext]
struct TestTwice<const Key: u64> {
    value: StoredValue<bool, AutoKey<Key>>,
    value2: StoredValue<bool, AutoKey<Key>>,
    value3: StoredValue<u32, AutoKey<Key>>,
    value4: StoredValue<u32, AutoKey<Key>>,
}

#[ink::storage_ext]
struct TestOnce {
    value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"),  42>>,
    value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
    value3: StoredValue<u32, AutoKey<42>>,
    value4: StoredValue<u32>,
}
xgreenx commented 2 years ago

The macro should be able to detect that there is a const Key specified and use it.

We can try to automate that process. For that, we need some marker to understand that it is generic for the storage key. Maybe we can define an alias in ink_storage crate like pub type StorageKey = u64 and check that type in const generic is StorageKey(It will not work if the user will alias it again).

Then we can simplify:

#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
    value: StoredValue<bool, AutoKey<Key>>,
    value2: StoredValue<bool, AutoKey<Key>>,
    value3: StoredValue<u32, AutoKey<Key>>,
    value4: StoredValue<u32, AutoKey<Key>>,
}

Into:

#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
    value: StoredValue<bool>,
    value2: StoredValue<bool>,
    value3: StoredValue<u32>,
    value4: StoredValue<u32>,
}
athei commented 2 years ago

Yes this is what I had in mind. Sure we can't detect the re-alias but I don't think this is a case we need to consider.

This is all looks pretty solid to me. The latest example looks like something I can very much live with. We should have the rest of the ink! team look over it so we are all on the same page.

HCastano commented 2 years ago

This discussion is pretty dense, but I'll try and take a look later this week

cmichi commented 2 years ago

@xgreenx Regarding #[ink(storage)] and #[ink::storage_ext]: In an ideal world we would need only #[ink(storage)] and detect if the struct is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only one storage attribute?

xgreenx commented 2 years ago

@xgreenx Regarding #[ink(storage)] and #[ink::storage_ext]: In an ideal world we would need only #[ink(storage)] and detect if the struct is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only one storage attribute?

I don't think that it is possible. I think it is better to add some kind of marker for root contract like #[ink(storage, root))]. We can ask the developer always to specify the root_key for root storage #[ink(storage, root_key = 0))]=)

athei commented 2 years ago

and detect if the struct is meant to be the top-level storage item or a nested one

I only see two ways:

/// What xgreenx suggested
#[ink(storage)] + #[ink(storage, root, root_key = <OPTIONAL_ROOT_KEY>))]
/// Define the name of the root struct by name
/// I favor this one because it also prevents declaring two structs as root struct.
#[ink(storage)] + #[ink(contract, storage = struct_name, root_key = <OPTIONAL_ROOT_KEY>)]

How else would you detect the root struct? You could always just define the "first struct" as the root struct but I don't think this would be a better solution.

xgreenx commented 2 years ago

We included that issue in our roadmap, so we are waiting for approval from ink! team to start work on it=)

HCastano commented 2 years ago

Okay, so I finally got around to taking a read today. I can't say I follow everything, but I have some initial thoughts and questions:

Direction looks really good in general though!

xgreenx commented 2 years ago

I want to highlight that at the moment we have two solutions here.

The first one with auto-incrementing storage key during compilation. The main problem of that solution is the usage of generics everywhere. It can be overcomplicated for developers(and of course for newcomers from Solidity).

The second is to generate the storage key based on the hash(STRUCT_NAME ++ FIELD_NAME). In that case, the usage of generics is optional(only if you plan to use the struct several times). But that solution requires additional checks after compilation time to be sure that storage keys are unique. That solution is more user-friendly.

What is this AtomicResolverHelper?

AtomicResolverHelper is for the first solution. It helps to use types without generics in places where we can do that. If the type is atomic, we don't need to use a generic for it and the resolver returns the type with an empty generic.

What is this supposed to indicate: const IS_VALUE_ATOMIC: bool;

IS_VALUE_ATOMIC = true means that the struct and substructs don't use StorageValue or Mapping(or another type that requires its own storage key).

Maybe generic in Test should be Key or RootKey instead?

It can be for the first solution. Actually it is TheNextStorageKeyBasedOnAllFieldsAbove, but we can use Key or RootKey=)

Need to re-read discussion about why https://github.com/paritytech/ink/issues/1134#issuecomment-1046332109

The problem with incorrect bounds for generics during derive. https://github.com/rust-lang/rust/issues/26925

For example, if you want to derive Default, the generated implementation will require the Default for generic RootKey: Default. But we don't need it=)

don't mind forcing users to use generics, they're a pretty basic Rust construct

In the case of the first solution, you need to specify it everywhere, and it makes the code much harder and more ugly like here.

Don't understand discussion https://github.com/paritytech/ink/issues/1134#issuecomment-1051303774 and https://github.com/paritytech/ink/issues/1134#issuecomment-1051367213

https://github.com/paritytech/ink/issues/1134#issuecomment-1051303774 we discussed the type for storage key. It can be 32 bytes(current ink uses it) or less(it should reduce the code size but increase the chance of collision).

https://github.com/paritytech/ink/issues/1134#issuecomment-1051367213 we discussed that if the developer added <const Key: StorageKey> to his struct, we will automatically add that offset to all inner structs. StorageKey type of the const generic is a trigger for as to do that.

athei commented 2 years ago

Most of the questions by @HCastano focus on the first solution (incrementing keys) and I think we pretty much settled on the second one (hash and truncate). Let me write a summary of that solution here and we can discuss this instead:

Open question

How to we handle inter-contract collisions? With seal_delegate_call those are relevant. Could they still be handled with metadata detection in case ink-as-dependency is used?

We will live with it for now and come up with a solution later

xgreenx commented 2 years ago

Everything is correct except:

Structs that should be put in storage but are not the root struct should be annotated with #[ink::storage_item(KEY)] where the key is optional and the default is derived from the struct name. This is necessary in order to prevent collisions in case the same struct is used multiple times.

Structs that should be put in storage but are not the root struct should be annotated with #[ink::storage_item]. To prevent collisions in case the same struct is used multiple times the developer should add const generic during the declaration of the struct(because the key of parent only is known during compilation time, not during codegen).

#[ink::storage_item]
struct TestTwice<const Key: StorageKey> {
    value: StoredValue<bool>,
    value2: StoredValue<bool>,
    value3: StoredValue<u32>,
    value4: StoredValue<u32>,
}

How to we handle inter-contract collisions? With seal_delegate_call those are relevant. Could they still be handled with metadata detection in case ink-as-dependency is used?

I think we can't handle it correctly in the case of seal_delegate_call. The developer can create a test contract by himself where he includes all fields to check that the upgraded contract will not have collisions.

Also, we can try to think about the customizable size of the storage key.

athei commented 2 years ago

To prevent collisions in case the same struct is used multiple times the developer should add const generic during the declaration of the struct(because the key of parent only is known during compilation time, not during codegen).

Why can't you know the value of the attribute during codegen? How is this different from knowing the struct name?

I think we can't handle it correctly in the case of seal_delegate_call. The developer can create a test contract by himself where he includes all fields to check that the upgraded contract will not have collisions.

Also, we can try to think about the customizable size of the storage key.

I think we can agree that we want a wasm primitive type for the key. This only leaves u64 which makes it much less likely to have a collision. However, we are still in the region where this will happen with billions of contracts deployed (I know I know). So it would still be better to have something to detect it.

I think we could add some functionality to cargo contract that allows you to pass in metadata of another contract to have it cross check for duplicates. Maybe output a warning if you use seal_delegate_call and don't supply this metadata. It still feels like too much hassle for an unlikely event, though. I think we just leave this question open for later. I think we don't need to solve it right now.

xgreenx commented 2 years ago

Why can't you know the value of the attribute during codegen? How is this different from knowing the struct name?

Exactly, it is the same as knowing the name of the struct!=) We can know the value of the attribute, but it will be the same value for all structs like that=D It is why we want to use a generic to have a "new" type of struct, specified by generic value.

athei commented 2 years ago

Ha ha yes I am just dumb. That's all. I will update the summary.

HCastano commented 2 years ago

Thanks for that summary Alex, super helpful!

Some more thoughts/questions

The root storage struct will be loaded and stored to storage as a whole on contract call and contract conclusion. However, Storage and Mapping will have empty Encode + Decode implementations so nothing is done there. If the root struct only consists of Storage or Mapping nothing is loaded on contract call or written on contract conclusion.

Structs that should be put in storage but are not the root struct should be annotated with #[ink::storage_item]. In order to prevent collisions in case the same struct is used multiple times it can be made generic by the user so that a different key can be used for each type instantiation:

  • Okay this is a little quirky, but I get it. I don't think it should be that bad from a learning/teaching point of view

Green, would it be too much to ask for an updated Playground example and example of how a contract developer would write contracts now?

xgreenx commented 2 years ago

E.g, can I have an empty root storage struct and work solely with storage_items?

storage_item can be standalone struct or can be part of the root. It is used to generate keys and implementation of some stuff required for correct work.

I guess we'll still need the root struct to calculate the ROOT_KEY, right?

If the user uses the struct twice or more, only in that case the parent key should be specified for storage_item's struct to prevent collision.

How do things like Storage<Mapping<K, V> work

The user will be able to do that, but it is useless in that way. But it can be useful if Storage<SomeStructWithMappingInside>.

How about Mapping<Storage, Storage>?

The value of the Mapping should be only atomic type. The key should implement Encode correctly=) So it is not possible

Okay, so is this the equivalent to the old Lazy construct? E.g, the StorageValue or Mappings are only loaded if they're actually used within a message

Yep, the question here do we want to have explicit get and set or implicit like in Lazy before.

Green, would it be too much to ask for an updated Playground example and example of how a contract developer would write contracts now?

I can create a new example, but I think it better to spend that time for real change=D Because most behaviors and cases are already discussed.

athei commented 2 years ago

Do we need to enforce that storage_item's be part of the main root struct, or can the be standalone?

Enforcing it to be part of the root struct would be difficult I think. Only thing that comes to mind is that storage_items might have a private constructor and can only be initialized through RootStruct::new. Next best thing would be a dylint.

E.g, can I have an empty root storage struct and work solely with storage_items?

AFAIK yes. This could work. Not sure how nice the code would be, though. Since the root struct will be reachable through self it would be much nicer to have it all tied together in the root struct. Otherwise you would create an ad-hoc instance of some Mapping or Storage just to access it.

I guess we'll still need the root struct to calculate the ROOT_KEY, right?

It does not matter where to define the ROOT_KEY. We could define it on the overarching #[contract] macro. This would theoretically remove the need to always have a root struct. However, in my opinion the best way would be to always require one and that all storage items will be used through it.

Okay, so is this the equivalent to the old Lazy construct? E.g, the StorageValue or Mappings are only loaded if they're actually used within a message?

Yes Storage is like Lazy. We should have manual: get and set functions. More automatic types should build on that but shouldn't be part of this first version. First get the manual stuff out.

The value of the Mapping should be only atomic type. The key should implement Encode correctly=) So it is not possible

Do we need some kind of marker trait for keys? Because non atomic types will have an (empty) Encode implementation?

I can create a new example, but I think it better to spend that time for real change=D Because most behaviors and cases are already discussed.

Talked to @cmichi . We are basically OK with continuing the rest of this discussion in a PR. What do you think @HCastano ? It all depends on your willingness @xgreenx to make bigger changes to your PR (or restart it completely) when we notice some bigger stumbling stones late in the process.

HCastano commented 2 years ago

Sure, that sounds alright with me

cmichi commented 2 years ago

Will also be interesting to see how the contract metadata will then change. Chances are that with this redesign we could close https://github.com/paritytech/ink/issues/767, which then would enable implementing https://github.com/paritytech/cargo-contract/issues/101.

@xgreenx Yeah so from our side you're good to go whenever you're up for it :-).

HCastano commented 2 years ago

Hey @xgreenx! Any updates here?

xgreenx commented 2 years ago

Hi, we've started work on it, but we are not fully involved at the moment. The previous week was related to stabilizing of OpenBrush(to use the stable ink!, speedup CI, adapting our projects), fixing issues found by users. In that week we are focused on upgradable contract, Proxy, and Diamond standards.

We plan to be fully involved in that issue after one week. it is part of 5th milestone of the Grant, so we plan to implement that=)

HCastano commented 2 years ago

Closed via #1331

tymat commented 7 months ago

https://use.ink/ink-vs-solidity

Note: as of now (ink! v3.3.1), when using cross-contract calls, emitting events will not work and compile errors will occur. 
See [issue #1000](https://github.com/paritytech/ink/issues/1000). Furthermore, the compiler will throw an error saying that 
(for example) Erc20Ref does not implement SpreadAllocate. This [issue #1149]
(https://github.com/paritytech/ink/issues/1149) explains more and has a workaround. These issues will be fixed in [issue 
#1134](https://github.com/paritytech/ink/issues/1134).

Can this document be updated?