Closed athei closed 2 years ago
I played with that idea, tested different implementations and cases, checked the sizes.
First of all to reduce the size, better to avoid frequent usage of ink_env::set_contract_storage
and ink_env::get_contract_storage
.
Instead of:
#[ink(storage)]
pub struct Flipper {
value: Storage<bool>,
value2: Storage<bool>,
value3: Storage<u32>,
value4: Storage<u32>,
}
Better to:
#[ink(storage)]
pub struct Flipper {
value: Storage<Test>,
}
struct Test {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
Or
#[ink(storage)]
pub struct Flipper {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
The second point - introduction of accessors like:
// we create accessors for each field with the correct mutability
// this way we keep the natural way of marking messages as
// mutable and getters
pub fn a(&self) -> &AFieldStorageValue { return &self.a };
pub fn b(&self) -> &BFieldMapping { return &self.b };
pub fn a_mut(&mut self) -> &mut AFieldStorageValue { return &mut self.a };
pub fn b_mut(&mut self) -> &mut BFieldMapping { return &mut self.b };
Will be very unclear for the user + we will have the problem with highlighting in IDE.
Based on the results I want to propose the next:
scale::Decode
and scale::Encode
(We are removing traits: SpreadLayout
, PackedLayout
, SpreadAllocate
, PackedAllocate
).ContractRootKey::ROOT_KEY
in one cell -> Fields don't have their own storage key by default. The contract is one monolith structure.#[ink(storage)]
(I will describe later why).Mapping<K, V, Mode = AutoStorageKey>
, Storage<T, Mode = AutoStorageKey>
and other types to allow the user to manage storage keys. The keys will be defined during compilation time(The main idea of that issue).ink! will provide types and structures that allow moving some fields into their own cell. These types will have two modes:
ManualStorageKey<const KEY: u128>
- it allows to specify the storage key of the type. It is u128
because it is the maximum allowed const type. In the code it is converted into [u8; 32]
via core::mem::transmute::<(u128, u128), [u8; 32]>((KEY, 0))
(Maybe you have a better idea? I tried the idea with created private types but the size of the contract was more. With const it showed better results).AutoStorageKey
- it is default mode. It means that #[ink(storage)]
macro will generate the key based on the NAME_OF_THE_STRUCTURE::NAME_OF_THE_FIELD
. We will replace the type of Storage<T, AutoStorageKey>
with a new one Storage<T, ManualStorageKey<hash!("NAME_OF_THE_STRUCTURE::NAME_OF_THE_FIELD")>>
. To recognize that the type supports IsAutoStorage<const KEY: u128>
trait we will use autoref feature. If it is IsAutoStorage<const KEY: u128>
we use IsAutoStorage<const KEY: u128>::StorageType
else we use previous one.That solution provides control over the storage keys. The developer can decide by himself that key use and where. The solution introduces only one limitation - all storage-related structures should be defined via the #[ink(storage)]
macro. The main storage of the contract can be marked with #[ink(contract)]
or #[ink(storage(main))]
or something like that.
That solution doesn't affect multi-files because everyone will use that rules to work with storage -> structures are marked with #[ink(storage)]
and implements scale::Decode + scale::Encode
. It also doesn't affect traits and ideas used in OpenBrush(We only need to rewrite the storage struct according to new rules). I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.
// `value` and `value2` are `Test2`. All inner storage keys are the same -> self.value.get().value4.set(&144) will also affect self.value2.get().value4
#[ink(storage)]
pub struct Flipper {
value: Storage<Test2, ManualKey<1>>,
value2: Storage<Test2, ManualKey<4>>,
}
#[ink(storage)]
struct Test2 {
value: Storage<bool>,
value2: Storage<bool>,
value3: Storage<u32>,
value4: Storage<u32>,
}
That problem has a workaround with manual offset:
#[ink(storage)]
pub struct Flipper {
value: Storage<Test2<100>, ManualKey<100>>,
value2: Storage<Test2<200>, ManualKey<200>>,
}
#[ink(storage)]
struct Test2<const Offset: u128> {
value: Storage<bool, ManualKey<{ Offset + 1 }>>,
value2: Storage<bool, ManualKey<{ Offset + 2 }>>,
value3: Storage<u32, ManualKey<{ Offset + 3 }>>,
value4: Storage<u32, ManualKey<{ Offset + 4 }>>,
}
But then we need to return an error if the storage layout contains two same keys to prevent the user from bugs=)
Below the example of the code that I used for testing(It is only MVP). The same idea can be implemented in Mapping
.
#[derive(Default)]
struct AutoKey;
#[derive(Default)]
struct ManualKey<const KEY: u128>;
#[derive(Default)]
struct Storage<T: Decode + Encode, Mode = AutoKey>(PhantomData<(T, Mode)>);
#[cfg(feature = "std")]
impl<T: Decode + Encode + scale_info::TypeInfo + 'static, Mode> ::ink_storage::traits::StorageLayout for Storage<T, Mode>
where Self: KeyTrait {
fn layout(_key_ptr: &mut KeyPtr) -> Layout {
Layout::Cell(CellLayout::new::<T>(LayoutKey::from(
&<Self as KeyTrait>::key(),
)))
}
}
trait KeyTrait {
fn key() -> Key;
}
trait StorageTrait: KeyTrait {
type Value: Decode + Encode;
fn get(&self) -> Self::Value {
ink_env::get_contract_storage(&Self::key()).unwrap().unwrap()
}
fn set(&self, value: &Self::Value) {
ink_env::set_contract_storage(&Self::key(), &value);
}
}
impl<T: Decode + Encode, const KEY: u128> KeyTrait for Storage<T, ManualKey<KEY>> {
#[inline]
fn key() -> Key {
Key::from(unsafe { core::mem::transmute::<(u128, u128), [u8; 32]>((KEY, 0)) })
}
}
impl<T: Decode + Encode, const KEY: u128> StorageTrait for Storage<T, ManualKey<KEY>> {
type Value = T;
}
impl<T: Decode + Encode> KeyTrait for Storage<T, AutoKey> {
#[inline]
fn key() -> Key {
// Here we will put `__ink_error...`
panic!()
}
}
impl<T: Decode + Encode> StorageTrait for Storage<T, AutoKey> {
type Value = T;
}
trait IsAutoKey<const KEY: u128> {
type StorageType;
}
impl<T: Decode + Encode, const KEY: u128> IsAutoKey<KEY> for Storage<T, AutoKey> {
type StorageType = Storage<T, ManualKey<KEY>>;
}
The whole contract is stored under the storage key ContractRootKey::ROOT_KEY in one cell -> Fields don't have their own storage key by default. The contract is one monolith structure.
Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.
In your examples I don't see the overarching main contract structure that ties all of that together.
I see only one problem: We are using the name of the struct and the name of the field to auto-generate storage keys for fields. That means that if someone uses the same type two times in his contract, he will have a collision.
This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.
Also, would be nice to have the example with Mapping
instead of Storage
. The latter is too simple to understand what is happening there.
Is that somewhere reflected in the code you posted? I think it contradicts the rest you are saying: We are selecting different keys with manual or automatic keys.
I didn't implement ContractRootKey::ROOT_KEY
logic in the example because it is part of #[ink::contract]
macro. All fields implement Decode + Encode
. To load a contract we will use ink_env::get_contract_storage(&ContractRootKey::ROOT_KEY)
, to flush - ink_env::set_contract_storage(&ContractRootKey::ROOT_KEY, &contract.encode())
.
The idea is that the developer for some type will create an empty implementation of Decode
and Encode
. For example Storage
or Mapping
. That types work with storage directly.
So all atomic types are loaded with contract. Other types doesn't require loading.
In your examples I don't see the overarching main contract structure that ties all of that together.
I will prepare more detailed example=)
This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.
Seems I find the solution for that problem. const_generics_defaults
feature is stable now, and we can create strcutures like:
struct Test2<const KEY: u128 = 0> {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
So our #[ink(storage)]
macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.
Also, would be nice to have the example with Mapping instead of Storage. The latter is too simple to understand what is happening there.
Will try to add an example for Mapping
too. The problem that example already is big=)
Seems I find the solution for that problem. https://github.com/rust-lang/rust/issues/44580#issuecomment-991782799 feature is stable now, and we can create strcutures like:
struct Test2<const KEY: u128 = 0> { value: bool, value2: bool, value3: u32, value4: u32, }
So our #[ink(storage)] macro can add a generic argument to specify the storage key of the struct during declaration=) I tested and it works but the generated code is ugly.
No matter, we can't use that KEY
to do calculations like { KEY + 2 }
during the declaration. It is possible with an unstable feature #![feature(generic_const_exprs)]
, but we want to be stable=)
I will prepare the simple version only with a manual key and with generated.
This is why we only want that one layer deep. You can only have different storage cells by putting different fields into the main storage structure.
I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.
I think we don't need to limit the developer. We only need to throw an error if he is using the same storage keys in several places. We can do that check during the generation of metadata.
If we can catch this during compilation then it is fine. Would be better to catch it during the compilation of the wasm, though.
The idea is that the developer for some type will create an empty implementation of Decode and Encode. For example Storage or Mapping. That types work with storage directly.
So all atomic types are loaded with contract. Other types doesn't require loading.
This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap
simply implement an empty Encode + Decode
. Just to reiterate: What we want to achieve with this rewrite is:
Drop
)initialize_contract
: This is done by generating keys through macros and not by passing them down via parametersI have some further questions:
Storage
with the value of a Mapping
? I don't think this makes sense, right?u128
). Would the same type also be used for the automatic key? I think this would be beneficial as it is much smaller as concatenating strings. Could we just use the position within the struct? Like having a counter inside the proc macro? This is how it is currently working (this counter is passed down as argument). We could even use smaller type there then (like u32
).Key
of the Mapping
itself? Do we require Display
for the Key
and just concat (remember we don't want to hash so that the key is transparent).I managed to implement auto-incrementing of the storage key during compilation(it is a hard variant where we don't need to worry about keys)=) Here is an example. It contains only an example with Storage
at the moment, but the Mapping
is the same in the implementation. There is ugly code, but it will be generated by the macro. In the comments, I describe how it should look like in the contract. I added Debug
everywhere to output the keys, so please ignore that stuff. That solution has a problem with the derive
macro because I'm adding generic Parent: NextStorageKey
into the definition of the struct, and derive
macro generates bounds for that generic, but it is only in PhantomData
=( Maybe you know some workaround for that?
This is not completely unlike how it is currently working. We also store the whole contract structure under a single root key and the storage types like HashMap simply implement an empty Encode + Decode. Just to reiterate: What we want to achieve with this rewrite is
Nope!=) Currently, we are storing each field under the own storage key. If we put all fields under ink_storage::Pack<T>
then we will store everything under one key, but currently, it is not. I compiled contracts with ink_storage::Pack<T>
and they take much less memory, so It is why I'm proposing to store all fields under the one key by default because it is the most performant strategy in terms of the contract's size.
If we generate storage keys during compilation, then the empty implementation of Encode + Decode
will not affect the size of the contract.
Would it be legal to nest those storage types? For example, to allow a Storage with the value of a Mapping? I don't think this makes sense, right?
Usage like Storage<Mapping<Key, Value>>
doesn't make sense. But Storage<Struct>
where Struct
contains many fields and some of that fields are Mapping
or Storage
is a valid case.
In all your examples you just a numeric key (u128). Would the same type also be used for the automatic key? I think this would be beneficial as it is much smaller as concatenating strings. Could we just use the position within the struct? Like having a counter inside the proc macro? This is how it is currently working (this counter is passed down as argument). We could even use smaller type there then (like u32).
Yeah, I want to propose using some integer as a storage key, because in terms of the contract we don't need to worry about the collision of fields. Yes, in the hard version that I posted it is auto-incremented.
How would you combine the root key of a mapping (a numeric value) with the Key of the Mapping itself? Do we require Display for the Key and just concat (remember we don't want to hash so that the key is transparent).
Storage key implements Encode
and key of Mapping
implements it. As we discussed with you in the chat, it can be done with a new seal_get_storage
it can be done like:
let key = (&Mapping::StorageKey, &Key).encode().as_ref();
let value = value.encode().as_ref();
seal_get_storage(key, key.len(), value, value.len())
seal_get_storage
will hash it by self
That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?
I mentioned that issue: https://github.com/rust-lang/rust/issues/26925
But seems we can handle it. The derive macro can be expanded before the #[ink(storage)]
macro, so we can add the generic for numeration later and it will not have conflicts with#[derive(...)]
. It works because of that change: https://github.com/rust-lang/rust/pull/79078
Our macro can check: Does exist at least one derive macro after him? if yes - our macro will put himself after the derive macro.
That solution has a problem with the derive macro because I'm adding generic Parent: NextStorageKey into the definition of the struct, and derive macro generates bounds for that generic, but it is only in PhantomData=( Maybe you know some workaround for that?
You are meaning the one on your Test
and Flipper
structs? Bounds are merely a heuristic and often wrong. It can detect PhantomData
but in your case the PhantomData
is burried deep in other types so the macro cannot see it. This is why we have attributes in parity-scale-codec
for its macros to overwrite bounds. In substrate we also have a DefaultNoBounds
macro for convenience. That said, it seems you have found a better solution by changing the the order in which the macros are applied.
I looked through your prototype and I think this would a a very good design. Having a numeric counter at compile team is the best of both worlds:
u128
for the key? Since it is only a counter now we can get away with u32
. This would map to a primitive operation in wasm. If you want to be very careful we could also use u64
which still has a native type in wasm. But I don't think it is necessary.KeyTrait
for ManualTypedKey
and Storage<T, ManualTypedKey>
with the same implementation? Couldn't the latter just use the former?Test
do not match the generated code: In the docs there is no usage of Storage
but in the generated code there is.This looks really promising to me. Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID)
which is 36byte assuming that we use u32
for the counter. Maybe 64byte would be a sensible limit to allow for Mapping
keys that are larger than an AccountId
.
Why u128 for the key? Since it is only a counter now we can get away with u32. This would map to a primitive operation in wasm. If you want to be very careful we could also use u64 which still has a native type in wasm. But I don't think it is necessary.
I tested it with current ink's codegen, so it was simpler to transmute (u128, u128)
into [u8; 32]
. But yes, we can use u32
or u64
in the final solution. I think u32
should be enough to not have collisions.
Why do you implement KeyTrait for ManualTypedKey and Storage<T, ManualTypedKey> with the same implementation? Couldn't the latter just use the former?
It is caused by the Debug
trait for logging=) It was a late night and I tried to finish the example faster. It is only for the println
in the example.
I think the docs which describe the user definition of Test do not match the generated code: In the docs there is no usage of Storage but in the generated code there is.
Yep, you are right, I forgot to update the comment.
// The user defiend the next struct:
//
// #[derive(Default, Debug)]
// #[ink(storage)]
// struct Test {
// value: bool,
// value2: Storage<bool>,
// value3: u32,
// value4: Storage<u32>,
// }
This can't detect duplicate keys, right? I played around a bit and was able to have duplicate keys.
Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey
mode, and only have Auto
, ManualOffset
. In that case, the code will not generate duplicate keys.
Once this progresses we also need to add a new version of storage functions to pallet_contracts that allow variably sized keys (with a limit though). The current size of 32bytes is to small if contracts do not hash the key because you need at least sizeof(counter) + sizeof(AccountID) which is 36byte assuming that we use u32 for the counter. Maybe 64byte would be a sensible limit to allow for Mapping keys that are larger than an AccountId.
I think resolving that issue will close that question as we discussed in the element channel.
I also want to highlight another problem in that solution. Test
is Test<()>
, but Test<ManualKey<2>>
is another structure. So if the user wants to work with the code that is generated for Test
he should transform it from Test<ManualKey<2>>
into Test<()>
.
I added AtomicResolveType
: If the structure is atomic(doesn't use Sotrage
or Mapping
) then we can use Test
, else Test<SomeKey>
. It helps to not generate useless code and work with Test
where it is possible. But the problem still exists for structures that are complex(that contains Storage
or Mapping
).
We can generate the AsRef
implementation, From
. But it still can be unclear for the developer why he should convert Test
into Test<SomeKey>
.
impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsRef<Test<NewParent>> for Test<OldParent> {
#[inline]
fn as_ref(&self) -> &Test<NewParent> {
unsafe { core::mem::transmute::<&Test<OldParent>, &Test<NewParent>>(self) }
}
}
impl<OldParent: NextStorageKey, NewParent: NextStorageKey> AsMut<Test<NewParent>> for Test<OldParent> {
#[inline]
fn as_mut(&mut self) -> &mut Test<NewParent> {
unsafe { core::mem::transmute::<&mut Test<OldParent>, &mut Test<NewParent>>(self) }
}
}
I think resolving https://github.com/paritytech/substrate/issues/7724 will close that question as we discussed in the element channel.
Yes. This will kill two birds with one stone.
Yep. I didn't think yet about the question: Is that possible to find duplicate keys during compilation? I plan to think about it. BUT, to avoid duplicate keys we can remove ManualKey mode, and only have Auto, ManualOffset. In that case, the code will not generate duplicate keys.
If it makes things easier we should drop the manual key for now. Right now the keys are also derived from order. We can add this later. Maybe even as a post processing step on metadata to not further complicate the code (as you initially suggested).
I also want to highlight another problem in that solution. Test is Test<()>, but Test<ManualKey<2>> is another structure. So if the user wants to work with the code that is generated for Test he should transform it from Test<ManualKey<2>> into Test<()>.
Yeah this is kind of a bummer. Didn't understand the code well enough to help with that right now. But I am sure that using transmute for this isn't the right way.
@xgreenx We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?
We think the best way forward would be to iterate on your example and get rid of the unsafe transmute. For the host functions you could mock them for now, so to not be blocked on that. What are your thoughts?
We need to decide on several questions and I think we can move forward.
NextStorageKey
on the storage structure or not?1.1. We can add a generic type during the codegen as described above. By default, it will be ()
. But if it is not an atomic struct, it will use the generic to have a correct storage key and the user should understand that he needs to work with the generic struct.
Pros:
Cons:
Struct<Key<1>>
into Struct<()>
or Struct<Key<2>>
.1.2. The user should explicitly specify the generic type for NextStorageKey
(In the case if the struct contains several generics the NextStorageKey
should be last). He doesn't need to use it in the struct, the codegen will use it.
Pros:
Cons:
The definition looks like:
#[derive(Default, Debug)]
#[ink(storage)]
struct Test<Last: NextStorageKey> {
value: bool,
value2: bool,
value3: u32,
value4: u32,
}
#[ink(storage)]
is a procedural macro that will generate impls for helper traits. How do we want to identify the main contract's storage?For example, we can use #[ink::storage]
to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)]
or #[ink::storage(contract)]
.
Or you can suggest your ideas=)
NextStorageKey
is too long, maybe Storage
is enough.1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?
For example, we can use #[ink::storage] to generate impls. But if it is a main contract's storage, then the user should mark it like #[ink::storage(main)] or #[ink::storage(contract)].
I would suggest keeping #[ink::storage(<ROOT_KEY>)]
for the main storage and using #[ink::storage_ext]
for additional storage items. <ROOT_KEY>
is optional and defaults to 0
. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.
Maybe you have some suggestions about naming(for all stuff)=D For example, NextStorageKey is too long, maybe Storage is enough.
I agree. If this is visible to the user a simple name as Storage
would go a long way in not confusing them. In general it is very worthwhile to spend a lot of time on naming. Don't glance over it. It is time well spent. That said, we can always change the names late in a review as these are simple find and replace changes.
1.1 vs 1.2: Given that these are the only two options I am leaning to the explicit solution. That said, I didn't put nearly as much thought into it as you did. So what is your recommendation and is there really no better way (I can't tell)?
I also prefer the explicit solution. Because the developer can google Rust information and find what is going on.
is there really no better way (I can't tell)
We want to know the key during the compilation time of each Storage
or Mapping
field -> Each Storage
or Mapping
field should have its own storage key.
Here we have two options:
Previously used cells should affect the storage key of the current struct -> The struct should know about the parent -> We need to use Generic.
Pros:
Cons:
NAME_OF_STRUCT::NAME_OF_FIELD
.Pros:
Cons:
But we can identify during metadata generation that someone used the struct more than one time and throw an error with description. And the user should use another struct.
I would suggest keeping #[ink::storage(
)] for the main storage and using #[ink::storage_ext] for additional storage items. is optional and defaults to 0. We don't need a way to set keys of individual items but the root key needs to be customizable or otherwise we will have conflicts with delegate call.
I'm okay with that solution=)
Can you please remind me what you mean when you say "atomic struct"?
I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.
I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping
and with Storage
. Then we can check whether this is actually something that is usable or to complicated for the users.
"atomic struct" - all fields are atomic. All types are atomic except Storage
and Mapping
.
I think we should go with the first solution (numeric key and explicit generics). The second one has the collision problem and will also lead to much bigger keys compiled into the data section of the contract as opposed to simple numbers.
We still can use u32
or u64
, the chance to generate the same key is not so high=)
I think before you start implementing, it would be best if you'd post a complete example here on how this stuff would be used. No need to write out all the types involved as in the first mock you posted. Only from a user perspective. With Mapping and with Storage. Then we can check whether this is actually something that is usable or to complicated for the users.
const ROOT_KEY: u32 = 123;
#[ink(storage(ROOT_KEY))]
pub struct Flipper {
value1: AtomicStruct,
value2: NonAtomicStruct,
value3: Storage<AtomicStruct>,
// Possible
value4: Storage<NonAtomicStruct>,
// Possible
value5: Mapping<u128, AtomicStruct>,
// Compilation Error, it is not possible
// value6: Mapping<u128, NonAtomicStruct>,
}
impl Flipper {
fn get_value1(&self) -> &AtomicStruct {
&self.value1
}
fn get_value2<T: Storage>(&self) -> &NonAtomicStruct<T> {
&self.value2
}
fn get_value3<T: Storage>(&self) -> &Storage<AtomicStruct, T> {
&self.value3
}
fn get_value4<T1: Storage, T2: Storage>(&self) -> &Storage<NonAtomicStruct<T2>, T1> {
&self.value4
}
fn get_value5<T: Storage>(&self) -> &Mapping<u128, AtomicStruct, T> {
&self.value5
}
}
#[ink::storage_ext]
pub struct AtomicStruct<Last: Storage = ()> {
value1: bool,
value2: u32,
value3: Vec<u32>,
value4: String,
}
#[ink::storage_ext]
pub struct NonAtomicStruct<Last: Storage = ()> {
value1: Mapping<u32, bool>,
value2: Storage<Vec<u32>>,
}
impl<Last: Storage> NonAtomicStruct<Last> {
fn get_value1<T: Storage>(&self) -> &Mapping<u32, bool, T> {
&self.value1
}
fn get_value2<T: Storage>(&self) -> &Storage<Vec<u32>, T> {
&self.value2
}
}
We still can use u32 or u64, the chance to generate the same key is not so high=)
I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.
I think with a numeric solution we could even get away with u8
. We would make the Key
generic and just use u8
by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.
Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.
Just one question: Why is the type parameter on the storage structs called Last
? It seems like a strange name.
I don't get it. How would we do that if we are generating the key from a name? Hash the name at compile time and truncate to 8 byte? We would still have the problem of collisions if a struct was used multiple times.
Yes, we can do the same as for selectors, we will truncate it to u32
or u64
.
I think with a numeric solution we could even get away with u8. We would make the Key generic and just use u8 by default. If we overflow we throw a compile time error and ask the user to specify a bigger type.
In WASM it still is u32
, I think it is better to keep u32
.
Regarding your example. I think this looks fine. Currently, we ask users to derive a lot of strange traits on storage structs which is not better. So I think we can manage with a few generics as long as you can also see them properly in the rust docs and it is all explicit.
Okay=) It looks normal to me. I only worry that someone from Solidity can be afraid of it=)
Just one question: Why is the type parameter on the storage structs called Last? It seems like a strange name.
Because actually, we will pass the last data type(that type took the previous cell) that will be used to get the next storage key. We can use Parent
but it is not actually parent=) We can simply call it Key: Storage
or S: Storage
. I'm not master of the naming=(
In WASM it still is u32, I think it is better to keep u32.
I think those constants will go into the wasm data section and can be packed densely there. Of course they will be loaded into an i32
when they are used but we still save the space in the data section. We would need to verify that, though.
We can use Parent but it is not actually parent=) We can simply call it Key: Storage or S: Storage. I'm not master of the word=(
Some renaming suggestions:
Storage
type to StoredValue
.Storage
trait deals mainly with storage keys if I understand it correctly. I suggest renaming it to: StorageKey
.Key: StorageKey
.Do you have a preference regarding numeric keys vs truncated 64bit hashes? My thoughts on truncated hashes:
Con:
hash(CONTRACT_ROOTKEY ++ STRUCT_NAME ++ FIELD_NAME)
: We cannot use the same non atomic struct twice. We would need an attribute to override the STRUCT_NAME
it in that case.Pro:
Let's do some napkin math regarding the size benefits of numeric keys. Let's say numeric keys are 1byte in contract size (stored in data section which is linear memory) and truncated keys are 8byte. So we look at 7byte overhead for each field.
How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue
already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.
Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.
Given these numbers I might be inclined to say that truncated hash version might be better if it is substantially simpler and gets rid of generics for the user.
I also prefer the version with truncated hash, because It will be simpler for me as for a smart contract developer. I don't like to put generics everywhere in cases where I plan to use only atomic structures(or not atomic struct but one time).
How many fields does the average contract have? I would say less than 20 (maybe even way less). That would be 140 byte size overhead for the truncated hash version. ERC20 has 4 fields and multisig would probably have 7 fields if we had StoredValue already. I would assume the amount of fields wouldn't go up indefinitely because you would group them into atomic structs for efficiency reasons.
Yea, the overhead seems not too high + it requires investigation regarding 1byte=)
Given that we use hash(CONTRACT_ROOTKEY ++ STRUCT_NAME ++ FIELD_NAME): We cannot use the same non atomic struct twice. We would need an attribute to override the STRUCT_NAME it in that case.
If someone wants to use some struct twice or more, he can specify the generic by himself.
/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>(PhantomData<(Key, XOR)>);
const ROOT_KEY: u64 = 123;
#[ink(storage(ROOT_KEY))]
pub struct Flipper {
value: TestTwice<hash_u64!("Flipper::value")>,
value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
value3: TestOnce,
}
#[ink::storage_ext]
struct TestTwice<const Key: u64> {
value: StoredValue<bool, ManualKey<hash_u64!("Test::value"), Key>>,
value2: StoredValue<bool, ManualKey<hash_u64!("Test::value2"), Key>>,
value3: StoredValue<u32, ManualKey<hash_u64!("Test::value3"), Key>>,
value4: StoredValue<u32, ManualKey<hash_u64!("Test::value4"), Key>>,
}
#[ink::storage_ext]
struct TestOnce {
value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"), 42>>,
value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
value3: StoredValue<u32>,
value4: StoredValue<u32>,
}
Having to manually key every field in TestTwice
seems a bit extreme just so we can use it twice. Couldn't the supplied const Key
be picked up by the automatic key generation? The macro should be able to detect that there is a const Key
specified and use it.
We can add XOR
part to AutoKey
:
/// The final storage key = Key ^ XOR.
struct ManualKey<const Key: u64, const XOR: u64 = 0>;
/// The auto-generated storage key(based on the name of the stuct and field) will be XORed with `XOR` value
struct AutoKey<const XOR: u64 = 0>;
const ROOT_KEY: u64 = 123;
#[ink(storage(ROOT_KEY))]
pub struct Flipper {
value: TestTwice<hash_u64!("Flipper::value")>,
value2: StoredValue<TestTwice<hash_u64!("Flipper::value2")>>,
value3: TestOnce,
}
#[ink::storage_ext]
struct TestTwice<const Key: u64> {
value: StoredValue<bool, AutoKey<Key>>,
value2: StoredValue<bool, AutoKey<Key>>,
value3: StoredValue<u32, AutoKey<Key>>,
value4: StoredValue<u32, AutoKey<Key>>,
}
#[ink::storage_ext]
struct TestOnce {
value: StoredValue<bool, ManualKey<hash_u64!("Hello_World"), 42>>,
value2: StoredValue<bool, ManualKey<hash_u64!("Hello_World")>>,
value3: StoredValue<u32, AutoKey<42>>,
value4: StoredValue<u32>,
}
The macro should be able to detect that there is a const Key specified and use it.
We can try to automate that process. For that, we need some marker to understand that it is generic for the storage key. Maybe we can define an alias in ink_storage
crate like pub type StorageKey = u64
and check that type in const generic is StorageKey
(It will not work if the user will alias it again).
Then we can simplify:
#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
value: StoredValue<bool, AutoKey<Key>>,
value2: StoredValue<bool, AutoKey<Key>>,
value3: StoredValue<u32, AutoKey<Key>>,
value4: StoredValue<u32, AutoKey<Key>>,
}
Into:
#[ink::storage_ext]
struct TestTwice<const Key: StorageKey> {
value: StoredValue<bool>,
value2: StoredValue<bool>,
value3: StoredValue<u32>,
value4: StoredValue<u32>,
}
Yes this is what I had in mind. Sure we can't detect the re-alias but I don't think this is a case we need to consider.
This is all looks pretty solid to me. The latest example looks like something I can very much live with. We should have the rest of the ink! team look over it so we are all on the same page.
This discussion is pretty dense, but I'll try and take a look later this week
@xgreenx Regarding #[ink(storage)]
and #[ink::storage_ext]
: In an ideal world we would need only #[ink(storage)]
and detect if the struct
is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only one storage
attribute?
@xgreenx Regarding
#[ink(storage)]
and#[ink::storage_ext]
: In an ideal world we would need only#[ink(storage)]
and detect if thestruct
is meant to be the top-level storage item or a nested one. Do you think we could simplify to have only onestorage
attribute?
I don't think that it is possible. I think it is better to add some kind of marker for root contract like #[ink(storage, root))]
. We can ask the developer always to specify the root_key
for root storage #[ink(storage, root_key = 0))]
=)
and detect if the struct is meant to be the top-level storage item or a nested one
I only see two ways:
/// What xgreenx suggested
#[ink(storage)] + #[ink(storage, root, root_key = <OPTIONAL_ROOT_KEY>))]
/// Define the name of the root struct by name
/// I favor this one because it also prevents declaring two structs as root struct.
#[ink(storage)] + #[ink(contract, storage = struct_name, root_key = <OPTIONAL_ROOT_KEY>)]
How else would you detect the root struct? You could always just define the "first struct" as the root struct but I don't think this would be a better solution.
We included that issue in our roadmap, so we are waiting for approval from ink! team to start work on it=)
Okay, so I finally got around to taking a read today. I can't say I follow everything, but I have some initial thoughts and questions:
AtomicResolverHelper
?const IS_VALUE_ATOMIC: bool;
StorageValue<$primitive_type>
?Test<Parent>
should be Key
or RootKey
instead?CELL_USED
is calculated at compile time#[ink::storage(<ROOT_KEY>)]
and storage_ext
seem fine to me
storage_ext
to storage_item
?Direction looks really good in general though!
I want to highlight that at the moment we have two solutions here.
The first one with auto-incrementing storage key during compilation. The main problem of that solution is the usage of generics everywhere. It can be overcomplicated for developers(and of course for newcomers from Solidity).
The second is to generate the storage key based on the hash(STRUCT_NAME ++ FIELD_NAME)
. In that case, the usage of generics is optional(only if you plan to use the struct several times). But that solution requires additional checks after compilation time to be sure that storage keys are unique. That solution is more user-friendly.
What is this AtomicResolverHelper?
AtomicResolverHelper
is for the first solution. It helps to use types without generics in places where we can do that. If the type is atomic, we don't need to use a generic for it and the resolver returns the type with an empty generic.
What is this supposed to indicate: const IS_VALUE_ATOMIC: bool;
IS_VALUE_ATOMIC = true
means that the struct and substructs don't use StorageValue
or Mapping
(or another type that requires its own storage key).
Maybe generic in Test
should be Key or RootKey instead?
It can be for the first solution. Actually it is TheNextStorageKeyBasedOnAllFieldsAbove
, but we can use Key
or RootKey
=)
Need to re-read discussion about why https://github.com/paritytech/ink/issues/1134#issuecomment-1046332109
The problem with incorrect bounds for generics during derive
. https://github.com/rust-lang/rust/issues/26925
For example, if you want to derive Default
, the generated implementation will require the Default
for generic RootKey: Default
. But we don't need it=)
don't mind forcing users to use generics, they're a pretty basic Rust construct
In the case of the first solution, you need to specify it everywhere, and it makes the code much harder and more ugly like here.
Don't understand discussion https://github.com/paritytech/ink/issues/1134#issuecomment-1051303774 and https://github.com/paritytech/ink/issues/1134#issuecomment-1051367213
https://github.com/paritytech/ink/issues/1134#issuecomment-1051303774 we discussed the type for storage key. It can be 32 bytes(current ink uses it) or less(it should reduce the code size but increase the chance of collision).
https://github.com/paritytech/ink/issues/1134#issuecomment-1051367213 we discussed that if the developer added <const Key: StorageKey>
to his struct, we will automatically add that offset to all inner structs. StorageKey
type of the const generic is a trigger for as to do that.
Most of the questions by @HCastano focus on the first solution (incrementing keys) and I think we pretty much settled on the second one (hash and truncate). Let me write a summary of that solution here and we can discuss this instead:
u32
base key assigned at compilation time.u32
will only add 4 bytes to the data section of the contract per field item.base_key
is calculated as hash(ROOT_KEY ++ STRUCT_NAME ++ FIELD_NAME) as u32
.2^16
fields. We will have way less fields. For example, the approximated probability of having a collision with 100 fields is 1.2 * 10^-6
. Those collisions will be detected when generating metadata and users will be guided to use a manual key in this highly unlikely case. Always assuming a perfect hash function (random distribution) but we are close enough.cargo contract
.Storage
uses the base_key
directly to store its one element.Mapping
will use base_key ++ map_element_key
as its key without further hashing. pallet contracts will be changed to allow for variable length keys. This not-hashing of element keys allows us to iterate over storage keys and not only values.Storage
and Mapping
will have empty Encode + Decode
implementations so nothing is done there. If the root struct only consists of Storage
or Mapping
nothing is loaded on contract call or written on contract conclusion.#[ink::storage(ROOT_KEY)]
where the ROOT_KEY
is optional and the default is derived from the struct name. The override is necessary in case seal_delegate_call
is used and we need to avoid a collision with the callee's storage.#[ink::storage_item]
. In order to prevent collisions in case the same struct is used multiple times it can be made generic by the user so that a different key can be used for each type instantation:
#[ink::storage_item]
struct TestTwice<const Key: StorageKey> {
value: StoredValue<bool>,
value2: StoredValue<bool>,
value3: StoredValue<u32>,
value4: StoredValue<u32>,
}
How to we handle inter-contract collisions? With seal_delegate_call
those are relevant. Could they still be handled with metadata detection in case ink-as-dependency
is used?
We will live with it for now and come up with a solution later
Everything is correct except:
Structs that should be put in storage but are not the root struct should be annotated with #[ink::storage_item(KEY)] where the key is optional and the default is derived from the struct name. This is necessary in order to prevent collisions in case the same struct is used multiple times.
Structs that should be put in storage but are not the root struct should be annotated with #[ink::storage_item]
. To prevent collisions in case the same struct is used multiple times the developer should add const generic during the declaration of the struct(because the key of parent only is known during compilation time, not during codegen).
#[ink::storage_item]
struct TestTwice<const Key: StorageKey> {
value: StoredValue<bool>,
value2: StoredValue<bool>,
value3: StoredValue<u32>,
value4: StoredValue<u32>,
}
How to we handle inter-contract collisions? With seal_delegate_call those are relevant. Could they still be handled with metadata detection in case ink-as-dependency is used?
I think we can't handle it correctly in the case of seal_delegate_call
. The developer can create a test contract by himself where he includes all fields to check that the upgraded contract will not have collisions.
Also, we can try to think about the customizable size of the storage key.
To prevent collisions in case the same struct is used multiple times the developer should add const generic during the declaration of the struct(because the key of parent only is known during compilation time, not during codegen).
Why can't you know the value of the attribute during codegen? How is this different from knowing the struct name?
I think we can't handle it correctly in the case of seal_delegate_call. The developer can create a test contract by himself where he includes all fields to check that the upgraded contract will not have collisions.
Also, we can try to think about the customizable size of the storage key.
I think we can agree that we want a wasm primitive type for the key. This only leaves u64
which makes it much less likely to have a collision. However, we are still in the region where this will happen with billions of contracts deployed (I know I know). So it would still be better to have something to detect it.
I think we could add some functionality to cargo contract
that allows you to pass in metadata of another contract to have it cross check for duplicates. Maybe output a warning if you use seal_delegate_call
and don't supply this metadata. It still feels like too much hassle for an unlikely event, though. I think we just leave this question open for later. I think we don't need to solve it right now.
Why can't you know the value of the attribute during codegen? How is this different from knowing the struct name?
Exactly, it is the same as knowing the name of the struct!=) We can know the value of the attribute, but it will be the same value for all structs like that=D It is why we want to use a generic to have a "new" type of struct, specified by generic value.
Ha ha yes I am just dumb. That's all. I will update the summary.
Thanks for that summary Alex, super helpful!
Some more thoughts/questions
storage_item
's be part of the main root struct, or can the
be standalone?
storage_item
s?storage_items
to be part rootROOT_KEY
, right?Storage<Mapping<K, V>
work?
Mapping<Storage<K>, Storage<V>>
?u32
keys for now
u64
and pad the existing keysThe root storage struct will be loaded and stored to storage as a whole on contract call and contract conclusion. However, Storage and Mapping will have empty Encode + Decode implementations so nothing is done there. If the root struct only consists of Storage or Mapping nothing is loaded on contract call or written on contract conclusion.
Lazy
construct? E.g, the StorageValue
or
Mapping
s are only loaded if they're actually used within a message
?Structs that should be put in storage but are not the root struct should be annotated with #[ink::storage_item]. In order to prevent collisions in case the same struct is used multiple times it can be made generic by the user so that a different key can be used for each type instantiation:
- Okay this is a little quirky, but I get it. I don't think it should be that bad from a learning/teaching point of view
Green, would it be too much to ask for an updated Playground example and example of how a contract developer would write contracts now?
E.g, can I have an empty root storage struct and work solely with storage_items?
storage_item
can be standalone struct or can be part of the root. It is used to generate keys and implementation of some stuff required for correct work.
I guess we'll still need the root struct to calculate the ROOT_KEY, right?
If the user uses the struct twice or more, only in that case the parent key should be specified for storage_item
's struct to prevent collision.
How do things like Storage<Mapping<K, V> work
The user will be able to do that, but it is useless in that way. But it can be useful if Storage<SomeStructWithMappingInside>
.
How about Mapping<Storage
, Storage >?
The value of the Mapping
should be only atomic type. The key should implement Encode correctly=) So it is not possible
Okay, so is this the equivalent to the old Lazy construct? E.g, the StorageValue or Mappings are only loaded if they're actually used within a message
Yep, the question here do we want to have explicit get
and set
or implicit like in Lazy
before.
Green, would it be too much to ask for an updated Playground example and example of how a contract developer would write contracts now?
I can create a new example, but I think it better to spend that time for real change=D Because most behaviors and cases are already discussed.
Do we need to enforce that storage_item's be part of the main root struct, or can the be standalone?
Enforcing it to be part of the root struct would be difficult I think. Only thing that comes to mind is that storage_items
might have a private constructor and can only be initialized through RootStruct::new
. Next best thing would be a dylint.
E.g, can I have an empty root storage struct and work solely with storage_items?
AFAIK yes. This could work. Not sure how nice the code would be, though. Since the root struct will be reachable through self
it would be much nicer to have it all tied together in the root struct. Otherwise you would create an ad-hoc instance of some Mapping
or Storage
just to access it.
I guess we'll still need the root struct to calculate the ROOT_KEY, right?
It does not matter where to define the ROOT_KEY
. We could define it on the overarching #[contract]
macro. This would theoretically remove the need to always have a root struct. However, in my opinion the best way would be to always require one and that all storage items will be used through it.
Okay, so is this the equivalent to the old Lazy construct? E.g, the StorageValue or Mappings are only loaded if they're actually used within a message?
Yes Storage
is like Lazy
. We should have manual: get
and set
functions. More automatic types should build on that but shouldn't be part of this first version. First get the manual stuff out.
The value of the Mapping should be only atomic type. The key should implement Encode correctly=) So it is not possible
Do we need some kind of marker trait for keys? Because non atomic types will have an (empty) Encode
implementation?
I can create a new example, but I think it better to spend that time for real change=D Because most behaviors and cases are already discussed.
Talked to @cmichi . We are basically OK with continuing the rest of this discussion in a PR. What do you think @HCastano ? It all depends on your willingness @xgreenx to make bigger changes to your PR (or restart it completely) when we notice some bigger stumbling stones late in the process.
Sure, that sounds alright with me
Will also be interesting to see how the contract metadata will then change. Chances are that with this redesign we could close https://github.com/paritytech/ink/issues/767, which then would enable implementing https://github.com/paritytech/cargo-contract/issues/101.
@xgreenx Yeah so from our side you're good to go whenever you're up for it :-).
Hey @xgreenx! Any updates here?
Hi, we've started work on it, but we are not fully involved at the moment. The previous week was related to stabilizing of OpenBrush(to use the stable ink!, speedup CI, adapting our projects), fixing issues found by users. In that week we are focused on upgradable contract, Proxy, and Diamond standards.
We plan to be fully involved in that issue after one week. it is part of 5th milestone of the Grant, so we plan to implement that=)
Closed via #1331
https://use.ink/ink-vs-solidity
Note: as of now (ink! v3.3.1), when using cross-contract calls, emitting events will not work and compile errors will occur.
See [issue #1000](https://github.com/paritytech/ink/issues/1000). Furthermore, the compiler will throw an error saying that
(for example) Erc20Ref does not implement SpreadAllocate. This [issue #1149]
(https://github.com/paritytech/ink/issues/1149) explains more and has a workaround. These issues will be fixed in [issue
#1134](https://github.com/paritytech/ink/issues/1134).
Can this document be updated?
Most of the existing storage types were removed because they are simply to inefficient in size and executed instructions. Right now we only have
Mapping
. However, this type doesn't work very well with the existing storage infrastructure and needs a band aid namedSpreadAllocate
andinitialize_contract
.We want to pivot more in the direction of how storage works in FRAME. We should keep the overarching contract data structure but use the storage macro to generate the supporting code instead of a elaborate system of traits.
This would also allow us more flexibility in how to generate the storage keys for these data structures and more control over what code is generated. The aim is to reduce the contract sizes by removing monomorphization and copying of the root keys.
Please note that the following is only a sketch of the generated code. It is probably not how the generated code should look like. @Robbepop noted that we should not use private sub modules. Please feel free to post your suggestions and I will update the top post.
We add a new trait that will be implemented by the storage macro to supply our storage traits with a key. Our storage traits have default implementations for every function. The macro will only set the associated types.
Then the storage macro creates a private module where it dumps all the generated code:
The biggest downside of this approach is the same as substrate's: New storage collections always need changes to be made in codegen in order to support them. However, I assume this will safe us a lot of code passing around all those keys and stateful objects (lazy data structures) at runtime. It is also much simpler and easier to reason about (because it has less features).
Open Questions
How does this affect/play with things like: