`EOFCREATE`: Don't hash the init-container

chfast commented 3 months ago

The create address derivation for EOFCREATE is based on CREATE2.

keccack256(sender_address + salt + keccak256(init-container))

where the sender_address is the logical address of the contract invoking EOFCREATE.

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

It also seems unnecessary expensive: EOFCREATE can only pick up one of the deploy-time sub-containers.

Solution 1: Use sub-container index

The create address is already bound to the "sender address", code is immutable (no SELFDESTRUCT) so replacing the hash of the sub-container with just its index may be enough.

Solution 2: Use code's address + sub-container index

The CREATE2 scheme uses the "sender address" with may not be the address of the code (see DELEGATECALL). I'm not sure if this is desired property for CREATE2. But for EOFCREATE this looks to be a problem. A contract may deploy different contract using DELEGATECALL proxy: for EOFCREATE inside a DELEGATECALL the same sub-container index will point to different sub-container. To fix this we can replace/combine the physical code address:

keccak256(code_address + salt + sub-container-index)
keccak256(sender_address + code_address + salt + sub-container-index)

axic commented 2 months ago

A relevant code is CREATE3: https://github.com/Vectorized/solady/blob/main/src/utils/CREATE3.sol

Should ask for feedback from library authors.

pdobacz commented 2 months ago

If we decide to take away initcontainer hashing (or change it somehow), we need to revisit and ask for an update to the EOF considerations for Verkle EIPs. A link a tentative PR which handles this (and a thread about initcontainer hashing) here.

[ ] check Verkle EIP(s) if update is necessary

frangio commented 2 months ago

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

A different perspective on this: the hash of the init container locks in the semantics, not the exact code contents of the contract. This doesn't prevent future rewrites of the code because they will be semantics-preserving or inherently breaking regardless of code observability.

Code observability should be removed to the extent that CODECOPY/EXTCODECOPY can cause semantics-preserving rewrites to become breaking changes indirectly. In my opinion, it should be totally okay to rewrite code even when the address is a witness of the original code on the account. In fact, it's a good thing that there's a way to recover CODECOPY via this witness in a way that doesn't risk being broken by rewrites!

I also think this ability to provide on-chain proof of the code/semantics of an account is an important primitive that we shouldn't get rid of.

pdobacz commented 2 months ago

Thank you for this perspective. We're gathering inputs on this one, so this feedback is very useful.

I also think this ability to provide on-chain proof of the code/semantics of an account is an important primitive that we shouldn't get rid of.

Do the Solution 1 & 2 above still qualify as getting rid of? Note that EOFCREATE in current form doesn't allow deploying arbitrary off-chain code, only that listed as one of its subcontainers. So it can be proven on-chain that a given account has code deployed from a known address' subcontainer

frangio commented 2 months ago

Because of DELEGATECALL I don't think Solution 1 gives any guarantees about the code/semantics of an account.

Solution 2 would work if there was a way to trace back to a "root" deployer whose address was computed with codehash. If EOFCREATE doesn't do that (and CREATE2 is "removed" via EOF), I don't think it would be possible to get a root deployer like that because creation transactions don't use the codehash.

I think the current state where the code hash is directly included is better though, because it takes a single hash to compute the address rather than a tracing process involving multiple hashes. Additionally, you may only care about proving the code of an account ignoring the code of its deployer, and if you have to trace it back to the deployer you are not able to do that. Overall I think including code hash into the address directly is significantly better.

frangio commented 2 months ago

A note about CREATE3.sol and similar patterns: this is often used to deploy contracts via CREATE2 at a deterministic address that doesn't depend on the creation parameters (or only some of them). For example Uniswap v3 does this here:

This kind of use case is actually natively addressed by EOFCREATE! The workaround will no longer be needed because the input is not included in the address formula.

The other side to this is that users of CREATE2 that do care about the creation parameters will need a new way to validate them. Either a trusted factory mixes them into the salt, or the contract exposes getters.

There is another potential use case for CREATE3.sol, which is to deploy at a deterministic address that is fully independent of the creation code. This other use case is not only not addressed by EOFCREATE, it becomes impossible to strictly implement under EOF, although it is easy to work around by deploying a proxy instead. I don't know how common this use case is honestly.

More context here: https://github.com/moodysalem/EIPs/blob/46350bb/EIPS/eip-3171.md#motivation

chfast commented 1 month ago

Analysis of input data for create address

It is preferable the inputs be less than 136 bytes (the keccak256 block size).

`CREATE`

Input length: 23–31 Prefix byte: 0xd6–0xde

The input is RLP encoded sender's address and sender's nonce. The encoding is variable-length with fixed encoding for the address and variable-length encoding of the nonce.

`CREATE2`

Input length: 85 Prefix byte: 0xff

The input is fixed length concatenation of prefix byte 0xff, sender's address (20 bytes), user-provided salt (32 bytes) and initcode hash (32 bytes). The prefix byte has been added to avoid collisions with the CREATE scheme but this is unnecessary because both schemes don't have inputs of the same lengths.

`EOFCREATE` solution 2

Input length: 63 or 97 Prefix byte: none

Concatenation of the addresses the sender and the code (2x 20 bytes), user-provided salt (32 bytes) and subcontainer index (1 byte). Alternatively, we can allocate 32 bytes per address for compatibility with Address Space Expansion. None of these total lengths match any existing schemes so the prefix byte is not necessary.

chfast commented 1 month ago

This kind of use case is actually natively addressed by EOFCREATE! The workaround will no longer be needed because the input is not included in the address formula.

Interesting. I was thinking about extending solution 2 to hash also inputs provided to EOFCREATE. But looks there are some use cases where this is undesirable.

The other side to this is that users of CREATE2 that do care about the creation parameters will need a new way to validate them. Either a trusted factory mixes them into the salt, or the contract exposes getters.

I think the pattern may be for user to hash inputs and combine them with the salt.

shemnon commented 1 month ago

What about chained EOF creates going deep? In this case would the "code address" of the second level create be the code address of the parent? If the code address is the address if the topmost container... not so much. Because then the index could be re-used at different depths to cause different contracts to be deployed at the same address based on call data (although returncontract can do that more cleanly).

Or is it the "sender address" that gets updated in nested EOFCREATES? Either way we need tests for this scenario.

chfast commented 1 month ago

O: CALL A A: EOFCREATE[1](X) C: eofcreate_addr(sender=A, code=A, salt=X, idx=1)

O: CALL A A: DELEGATECALL B B: EOFCREATE[2](X) C: eofcreate_addr(sender=A, code=B, salt=X, idx=2)

O: CALL A A: EOFCREATE[1](X) C: eofcreate_addr(sender=A, code=A, salt=X, idx=1) C: EOFCREATE[2](Y) (initcode execution) D: eofcreate_addr(sender=C, code=C?, salt=Y, idx=2) C: deployed

O: CALL C (deployed above) C: EOFCREATE[2](Y) E: eofcreate_addr(sender=C, code=C?, salt=Y, idx=2) this will generate the same address and collide with D.

pdobacz commented 1 month ago

this will generate the same address and collide with D.

Is this a problem though? seems OK to me. We have the a deterministic address to deploy D / E at, but the code itself (contents) is not in the witness

frangio commented 1 month ago

Wouldn't this scheme work?

keccack256(sender_address || code_address || salt || during_init || init_subcontainer_idx)

during_init would be a boolean that is true iff EOFCREATE executes during sender init.

The combination (code_address, during_init, init_subcontainer_idx) should uniquely identify a container. If during_init == true, take the init container that created code and look at subcontainer number init_subcontainer_idx. If during_init == false, take the runtime container of code and look at subcontainer number init_subcontainer_idx.

Amending @chfast's examples above:

O: CALL A A: EOFCREATE[1](X) C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)

O: CALL A A: DELEGATECALL B B: EOFCREATE[2](X) C: eofcreate_addr(sender=A, code=B, salt=X, during_init=0, idx=2)

O: CALL A A: EOFCREATE[1](X) C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1) C: EOFCREATE[2](Y) (initcode execution) D: eofcreate_addr(sender=C, code=C, salt=Y, during_init=1, idx=2)

O: CALL C (deployed above) C: EOFCREATE[2](Y) E: eofcreate_addr(sender=C, code=C, salt=Y, during_init=0, idx=2)

E no longer equal to D

Any further nested EOFCREATE would have different sender.

pdobacz commented 1 month ago

EDIT: this post is likely some kind of misunderstanding on my part. We mentioned originally that code address is the address of the outer-most EOFCREATE, but actually, a scheme where code_address changes during each EOFCREATE seems to avoid address conflicts better...

Having revisited @chfast 's example after a while, I think by code_address we mean the address where the outer-most EOFCREATE in a nested chain of EOFCREATEs resides (there is no other address with code in that chain yet!), so:

O: CALL A A: EOFCREATE[1](X) C: eofcreate_addr(sender=A, code=A, salt=X, idx=1) C: EOFCREATE[2](Y) (initcode execution) D: eofcreate_addr(sender=C, code=~C~ A, salt=Y, idx=2) <----- code not C but A here C: deployed

This already makes D != E or putting it differently:

When CALL is used to change context in a chain of calls, both sender and code will change
DELEGATECALL - code will change, sender won't
EOFCREATE - sender will change, code won't

However, this only changes the way one runs into the D==E conflict:

O: CALL C (deployed above) C: DELEGATECALL A A: EOFCREATE[2](Y) E: eofcreate_addr(sender=C, code=A, salt=Y, idx=2) == D

but initcodes used for D and E are different (at different nesting depth in A)

pdobacz commented 1 month ago

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

gumb0 commented 1 month ago

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

pdobacz commented 1 month ago

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

The problem stems from the fact that we're swapping the code executing at the context of C - during init it is the initcode, after it is the initcode's subcontainer (RETURNCONTRACTed). This makes the two instances of EOFCREATE[2](Y) mean different things and is solved by Frangio's proposal. From that PoV it kinda makes sense - it's 2 different codes, but both live at C, so they have a common code_address, so to speak.

code_address name is just a name related to DELEGATECALL (which we are addressing here). Having this in mind, with executing_address it isn't clear to me if it's the code_address or msg.recipient address (the context which "executes" some code)...

gumb0 commented 1 month ago

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

The problem stems from the fact that we're swapping the code executing at the context of C - during init it is the initcode, after it is the initcode's subcontainer (RETURNCONTRACTed). This makes the two instances of EOFCREATE[2](Y) mean different things and is solved by Frangio's proposal. From that PoV it kinda makes sense - it's 2 different codes, but both live at C, so they have a common code_address, so to speak.

From my perspective in case of EOFCREATE nested in outer EOFCREATE initcode, the inner EOFCREATE's initcode doesn't "live at C" at all, it has almost nothing to do with C. C is an address that will be deployed (or not) when outer EOFCREATE finishes. C happens to be msg.recipient when inner EOFCREATE is being executed (this is what I call "executing address")

But this is a bit of bikeshedding. I agree during_init flag seems to solve it.

pdobacz commented 1 month ago

inner EOFCREATE's initcode doesn't "live at C" at all, it has almost nothing to do with C

Yeah, I see your point here. But actually the bikeshedding is useful. I revisited the option with "code_address doesn't change on EOFCREATE + during_init" with this new perspective and now it seems to me it works too, I must've made a mistake somewhere yesterday, PTAL:

O: CALL A A: EOFCREATE[1](X) C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1) C: EOFCREATE[2](Y) (initcode execution) D: eofcreate_addr(sender=C, code=A, salt=Y, during_init=1, idx=2) C: deployed

O: CALL C (deployed above) C: EOFCREATE[2](Y) E: eofcreate_addr(sender=C, code=C, salt=Y, during_init=0, idx=2) != D

O: CALL C (deployed above) C: DELEGATECALL A A: EOFCREATE[2](Y) F: eofcreate_addr(sender=C, code=A, salt=Y, during_init=0, idx=2) != D and != E

In this version code_address matches the expectations - it is where the code lives

gumb0 commented 1 month ago

In this version code_address matches the expectations - it is where the code lives

Yes, I like this version more. Seems to work and not conflict on deeper nesting levels, too.

O: CALL A A: EOFCREATE[1] C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1) C: EOFCREATE[2] (initcode execution) D: eofcreate_addr(sender=C, code=A, salt=Y, during_init=1, idx=2) D: EOFCREATE[2] (initcode execution) G: eofcreate_addr(sender=D, code=A, salt=Y, during_init=1, idx=2)

frangio commented 1 month ago

This seems to work. The approach seems equivalent to a list of indices pointing at a deeply nested subcontainer of code.

I find it hard to reason about though.

I think if you see G = eofcreate_addr(sender=D, code=A, salt=Y, during_init=1, idx=2), during_init=1 means that the runtime container at G was deployed by an init container located somewhere in A, in particular subcontainer index 2 of the init container that deployed D. Recursively you arrive at C, where during_init=0 means that the runtime container at C was deployed by the subcontainer A[1].

So the init containers for each of these contracts are:

C: A[1]
D: A[1][2]
G: A[1][2][2]

There seems to be some redundancy here:

the runtime container at G was deployed by an init container located somewhere in A, in particular subcontainer index 2 of the init container that deployed D

A is not really used in the procedure.

So an alternative could be to remove during_init and replace during_init=1 with code_address=0, since the actual code_address is implicit in sender_address in this case.

keccack256(sender_address || code_address || salt || init_subcontainer_idx)

Note that EOFCREATE in a DELEGATECALL context always results in code_address set to the target of DELEGATECALL, because that's where the init container is located, regardless of whether the sender is being deployed.

frangio commented 1 month ago

Since it looks like we may have solved this issue I'll resurface my previous comment. With this change we would be losing the ability to make on-chain proofs about the behavior of an account without a trusted factory (although it would be recoverable with a zk-coprocessor). I do think we need to consider whether it's okay to remove that, or if it's a primitive that applications are relying on.

I'm currently weakly leaning towards probably okay to remove.

pdobacz commented 1 month ago

losing the ability to make on-chain proofs about the behavior of an account without a trusted factory

Can you clarify what kind of a behavior proof? Just that codehash(address) == particular hash? Could this be substituted by address_and_subcontainer_idx_of_a_particular_factory(address) == particular address and idx? That is instead of proving code hash of an address is X, we prove where exactly the code is coming from.

frangio commented 1 month ago

Yeah that works if you have a trusted/known factory. This is probably enough.

gumb0 commented 1 month ago

So an alternative could be to remove during_init and replace during_init=1 with code_address=0, since the actual code_address is implicit in sender_address in this case.

I like this variant, too. I would reframe it as we'd have 2 different schemes depending on whether EOFCREATE is called inside initcode:

Non-nested EOFCREATE: keccak256(sender_address + code_address + salt + sub-container-index)
Nested EOFCREATE inside initcode: keccak256(sender_address + salt + sub-container-index)

charles-cooper commented 2 weeks ago

how about just removing the witness entirely? i.e. keccak256(sender_address + salt). i think the user is mainly interested that other users cannot trivially produce a collision with some salt they want to deploy to, but the EVM can let them be responsible for making sure they don't produce a collision with themselves.

charles-cooper commented 2 weeks ago

one issue with using the initcontainer index in the hash is that it makes counterfactual address calculation potentially impossible on chain. since the EOFCREATE-ing contract cannot introspect the code of the factory address (that it delegates to), it cannot counterfactually produce the target address.

charles-cooper commented 2 weeks ago

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

also wanted to point out that using initcontainer index rules out certain types of code rewrites as well. for instance, reordering of initcontainers or fusing them.

pdobacz commented 1 week ago

@frangio @cameel @shemnon Can you please take a quick look at the proposed keccak256(sender_address + salt) scheme above? We can also discuss more on todays EOF call (I've proposed it for the agenda).

My soft take on this at the moment is that I'm on the fence. From what I understand the + and - are:

Pros:

simpler and cleaner
everything is on-chain and deterministic in EOFCREATE, so including any initcode witness feels redundant
can compute the counterfactual address on-chain in a generic factory contracts delegatecalling some EOFCREATE-ing contracts (not sure if I understand that one tbh)
we can charge less gas
easier cross-chain deployments (because more flexible)
even less code introspection as noted above

Cons:

compilers would need to provide good default salts (?), but have limited options because there's no CODE_ADDRESS on-chain
if there are no good defaults, or these are not used, people will inadvertently run into create conflicts, possibly bricking contracts and locking up funds

frangio commented 1 week ago

I was skeptical initially, but it's definitely interesting and after some thought I really like the proposal.

I find this a good design principle:

the user is mainly interested that other users cannot trivially produce a collision with some salt they want to deploy to, but the EVM can let them be responsible for making sure they don't produce a collision with themselves

A witness for the code can be recovered by mixing it into the salt, if that's important for the application. In fact the contract can choose to use subcontainer_index or even subcontainer_hash if it wants to. I think this proposal is the natural conclusion of the prior discussion. If we were going to remove subcontainer_hash from the formula and replace it by a subcontainer_index, we were already accepting that an equivalent witness would require knowledge of the deployer code, in particular of its subcontainers. But if you have knowledge of the deployer code you also have knowledge of how it constructs salts, so you can use that for the witness.

limited options because there's no CODE_ADDRESS on-chain

There is a common pattern to get CODE_ADDRESS by using an immutable Solidity variable:

address private immutable self = address(this);

In EOF this would be aux_data set in the init_container.

compilers would need to provide good default salts (?), but have limited options because there's no CODE_ADDRESS on-chain

If Solidity wants to preserve Legacy-like guarantees about collisions it could construct a salt that preserves that, but I wouldn't necessarily expect that from all languages/compilers.

charles-cooper commented 1 week ago

I think this proposal is the natural conclusion of the prior discussion.

Yes- and not to belabor the point, but it's the natural conclusion of not allowing code introspection in EOF.

If msg.sender is an EOA, it can inspect the code and construct its own witness to mix into the salt.
If msg.sender is a contract, it can't introspect the factory (delegatecall'ed) contract, so it can't counterfactually produce an address (which is the whole point of having salt in the original CREATE2 design).
Even when you have prior knowledge (e.g., in the compiler) of the initcontainer index, you need to add extra code paths to be able to counterfactually compute the address, increasing complexity.
And indeed, inside the compiler is the only place you can do the counterfactual computation, it seems impossible in user code without compiler intrinsics

In my opinion, it also solves some outstanding problems with CREATE2, which are:

you can't counterfactually deploy a contract without having constructor arguments. I know EOF obsoletes this because the auxdata would be separate from the initcode, but there could still be cases where you have different initcode that you want to deploy to the same address (e.g. a code upgrade on another chain).
the on chain computation of the counterfactual address is a little bit annoying since you need to calculate the witness rather than having it provided. It's a little thing, but I think it is more elegant for everything to be encoded in the salt.

I can't attend today's call, but hopefully I have provided enough motivation for this proposal in this jssue. I will probably be available via chat (e.g. discord) in case there are questions for me during the meeting.

frangio commented 1 week ago

The main concern that we dicussed in today's call was how this new proposal puts a lot of responsibility on developers, and how they may introduce security issues if they incorrectly carry their mental model of CREATE2 guarantees over to EOFCREATE.

It would certainly be unfortunate if Solidity's new Foo{salt: salt}(input) provided different security guarantees depending on whether the compilation target is Legacy or EOF. But note that this is not a new issue ⚠️ , it's already the case in current EOFCREATE because of the omission of input in the address formula.

One way to address this is by having language or library abstractions around EOFCREATE construct a salt which recovers those guarantees.

For example, given the above statement, Solidity could construct final_salt = keccak256(salt || Foo_hash || input) and use that in the call to EOFCREATE. If a developer wanted to skip that they might instead use new Foo{raw_salt: salt}(input).

In the latest proposal by @charles-cooper, this doesn't fully recover CREATE2 guarantees because the target of a DELEGATECALL would be able to create a contract at the same address but with different code. This is not something the compiler or a library can fix.

So perhaps a better minimalist formula that still allows safe abstractions to be built on top could be: keccak256(sender_address + code_address + salt).

cameel commented 1 week ago

For Solidity the biggest concern with such changes is the added friction for users in transitioning from non-EOF to EOF backend. EOF as a whole is a big win for us and we want the users to have no obstacles in switching to it. Ideally we'd be able to just keep the current high-level creation syntax and also not make it any more expensive. That's already impossible due to existing differences between CREATE/CREATE2 and EOFCREATE, so it's really a matter of degree of breakingness.

We think that reducing the address calculation to keccak256(sender_address + salt) goes a bit too far in that regard. We'd rather avoid having the compiler mix additional values into the salt under the hood, because it obfuscates the calculation. It also silently increases the gas cost and users really don't like that. Having the user-specified salt passed as is to EOFCREATE is preferable, so we'd likely end up shifting the responsibility for doing any extra calculations to the user as a result. That on the other hand has the problem mentioned by @frangio - the statements look identical, but collision guarantees are completely different.

Also, there are really no good defaults for the salt. As already pointed out, not all the pieces of information needed to recreate the current calculation are even available to the user. It can be worked around to some degree, e.g. via the proposed pattern of storing code_address in an immutable, but it seems like we'd be replacing a safer mechanism with something that is easy to get wrong.

Overall, the more of this calculation is handled by the opcode, the better for us, so we'd prefer the variant from https://github.com/ipsilon/eof/issues/162#issuecomment-2463795391, because its collision guarantees are much closer to what happens in non-EOF code. The simpler keccak256(sender_address + code_address + salt) may be good enough too, though less preferable.

I find this a good design principle:

the user is mainly interested that other users cannot trivially produce a collision with some salt they want to deploy to, but the EVM can let them be responsible for making sure they don't produce a collision with themselves

This principle sounds reasonable to me, though one obvious question that comes to mind is - why doesn't CREATE2 already follow it? The issue is not really the calculation itself but the fact that the proposed change is straying too far from the original behavior.

But to be clear, we would not mind having a more flexible mechanism available. Just not as the only option. Doing that makes the transition to EOF more difficult.

pdobacz commented 1 week ago

Thanks all for the great inputs. I want to clarify some pieces of the above comments.

First, I really liked this proposition:

For example, given the above statement, Solidity could construct final_salt = keccak256(salt || Foo_hash || input) and use that in the call to EOFCREATE. If a developer wanted to skip that they might instead use new Foo{raw_salt: salt}(input).

However, I'm not yet sure the latter issue is true:

In the latest proposal by @charles-cooper, this doesn't fully recover CREATE2 guarantees because the target of a DELEGATECALL would be able to create a contract at the same address but with different code.

We've included Foo_hash in the final_salt, so that should be impossible.

And moving on to the other comment:

We'd rather avoid having the compiler mix additional values into the salt under the hood, because it (...) also silently increases the gas cost and users really don't like that

Assuming in that scheme EOFCREATE does not charge for hashing, the cost of the final_salt default behavior should be still cheaper than CREATE2, because you only hash the input with the hash and user's salt, not the full initcode anymore. Using raw_salt with custom made scheme gives an opportunity to lower the gas cost further in cases where it's possible (aka "Advanced feature").

the user is mainly interested that other users cannot trivially produce a collision

This principle sounds reasonable to me, though one obvious question that comes to mind is - why doesn't CREATE2 already follow it?

Not sure about this, but I think CREATE2 caters for fully counterfactual deployments (code not on-chain), so it needs to include the initcode hash. EOFCREATE doesn't, so it has the opportunity to improve, fully counterfactual deployments will be up to the future TXCREATE.

frangio commented 1 week ago

We've included Foo_hash in the final_salt, so that should be impossible.

It would be possible if the target of DELEGATECALL uses the proposed raw_salt in Solidity or just handwritten EVM code.

If we say that developers should only delegate to code they trust not to do this, we're back in the argument about the degree of responsibility put on them. CREATE2 collisions triggered by DELEGATECALL targets is not something they're responsible for today in legacy code.

kuzdogan commented 1 week ago

During the conversation in Nov 27 call a new separate metadata section was proposed and generally agreed upon: data that can't be read by EVM and any change to this does not affect the code (spec/EIP TBD).

I'd like to point out that, if we are to keep the initcode hash in the EOFCREATE parameters, there's a benefit to leaving out the metadata section in the hash calculation. This is a common current headache in CREATE2 contracts and reason many teams choose to opt-out of the metadata hash in the bytecode.

pdobacz commented 1 week ago

if we are to keep the initcode hash in the EOFCREATE parameters, there's a benefit to leaving out the metadata section in the hash calculation.

I see it more like the metadata section's being there becomes a strong argument for leaving out the initcode hash entirely - in any of the variants proposed here. Calculating initcode hash from a subset of sections sounds impractical to me.

pdobacz commented 1 week ago

It would be possible if the target of DELEGATECALL uses the proposed raw_salt in Solidity or just handwritten EVM code.

Oh yes, impossible when final_salt is used. But I assume using raw_salt would be intentional usage, when the CREATE2/EOFCREATE usage pattern must be somehow customized, and then you depart from the "CREATE2 guarantees parity".

If we say that developers should only delegate to code they trust not to do this

Isn't it already the case only trusted code should be delegated to? That is, in broader context than just creation logic. If we have a target function delegatecall_me_i'll_create_a_contract, I'd expect it would document precisely how it will create, if it uses the non-default raw_salt inside, and the caller would decide, if it can delegate to it.

Is there a usage pattern where that wouldn't work well enough @frangio?

charles-cooper commented 1 week ago

Not sure about this, but I think CREATE2 caters for fully counterfactual deployments (code not on-chain), so it needs to include the initcode hash. EOFCREATE doesn't, so it has the opportunity to improve, fully counterfactual deployments will be up to the future TXCREATE.

speaking of counterfactuality, i think the issue with keccak256(msg.sender + code_address + salt) and the variants including the subcontainer index are that they can't be computed on-chain. i'm not sure how big of an issue this is, but it is already stepping outside of the design goals of the original CREATE2, which allow smart contracts to compute counterfactual deploy addresses.

frangio commented 1 week ago

Isn't it already the case only trusted code should be delegated to?

Yes but the way creation salts are constructed is not a property one has to audit of DELEGATECALL targets at the moment. It's not impossible to audit, it's just a new checklist item with respect to legacy.

The way I'm thinking about this is global vs local properties, where global collision avoidance should be taken care of by the EVM and local collision avoidance should be ensured by the contract code, where "contract code" is the developer and their compiler, and the property should not be breakable by an end user (eventually an attacker) under any circumstance (barring Keccak256 breaking) including if the user is able to choose a DELEGATECALL target. I recognize this last part is pretty strong so I'm not attached to it.

frangio commented 1 week ago

i think the issue with keccak256(msg.sender + code_address + salt) and the variants including the subcontainer index are that they can't be computed on-chain

Not sure what you mean by this. It can be computed on chain if you know the parameters, which the contract should document and would already be documented anyway, among other things because the salt is often constructed and not explicit in the input.

Can you describe end to end a scenario where you see an issue?

charles-cooper commented 1 week ago

i think the issue with keccak256(msg.sender + code_address + salt) and the variants including the subcontainer index are that they can't be computed on-chain

Not sure what you mean by this. It can be computed on chain if you know the parameters, which the contract should document and would already be documented anyway, among other things because the salt is often constructed and not explicit in the input.

Can you describe end to end a scenario where you see an issue?

i mean like you need more arguments than are required for just the eofcreate, e.g.

def create_something() -> address:
    salt: bytes32 = self._compute_salt()
    return factory.create(args, salt)  # calls EOFCREATE or delegates to to another factory

def counterfactual() -> address:  # compute the address counterfactually of factory.create()
    salt: bytes32 = self._compute_salt()
    return ... # can't, need the final code_address and potentially the initcontainer index depending on the scheme

frangio commented 1 week ago

Ok, I was going to suggest the factory should expose a getter for counterfactual addresses because it knows those parameters. But this is not possible if the factory is upgradeable because the code_address parameter becomes time dependent, future values are not known and all past values need to be stored.

charles-cooper commented 1 week ago

right -- so i think the point is it would break an existing property of CREATE2, which is that you can counterfactually predict the address of invoking CREATE2 from just its inputs.

ipsilon / eof