paritytech / revive

Solidity compiler for PolkaVM
Apache License 2.0
35 stars 6 forks source link

Not supported evm.bytecode and evm.assembly options in compiler configuration #39

Open smiasojed opened 2 months ago

smiasojed commented 2 months ago

Although the tool claims that it does not support evm.bytecode and evm.assembly, it still returns it as part of the compilation output.

Used compiler config:

{
    "language": "Solidity",
    "settings": {
        "optimizer": {
            "enabled": true,
            "runs": 200
        },
        "outputSelection": {
            "*": {
            "": ["ast"],
            "*": ["abi", "metadata", "devdoc", "userdoc", "storageLayout", "evm.bytecode", "evm.legacyAssembly", "evm.deployedBytecode", "evm.methodIdentifiers", "evm.assembly"]
            }
        }
    }
}

It is expected that tool accepts evm.bytecode and evm.assembly

smiasojed commented 1 month ago

evm.gasEstimates is also not supported

xermicus commented 4 weeks ago

Stemming from an internal discussion:

We substitute any byte-code incompatible fields with empty objects. This won't crash any tools relying on it, they can still request it but will get a zero or empty value. Which they can do nothing with so we should be save. Also this will make remix happy.

athei commented 2 weeks ago

Both of them should be advertised as being available in order to maximize compatibility.

evm.bytecode should be returned but empty (we only have init code). evm.assembly should contain the PolkaVM dissassembly.

xermicus commented 1 week ago

To me it'd also make sense to just provide the blob for both, the deploy code and runtime code? Because it is the same, it seems like we can just provide both for maximum compatibility.

athei commented 1 week ago

Yeah that would work, too.

smiasojed commented 1 week ago

Remix uses evm.bytecode internally, so I think it is better to keep bytecode in sync with deployedBytecode.

athei commented 1 week ago

Because of course they do. They depend on every little detail :(

smiasojed commented 1 week ago

evm.assembly should contain the PolkaVM dissassembly.

I think that we need to update polkavm-disassembler to be compatible with evm.legacyAssembly

"legacyAssembly": 
                {
                            ".code": [
                                {
                                    "begin": 199,
                                    "end": 555,
                                    "name": "PUSH",
                                    "source": 0,
                                    "value": "80"
                                }
                                                           ]
                }
xermicus commented 1 week ago

Legacy assembly is not to confuse with EVM bytecode. Legacy assembly is technically speaking on of our IRs. It can be passed through as-is.

smiasojed commented 1 week ago

We have in revive

  #[serde(rename = "legacyAssembly")]
    pub assembly: Option<Assembly>,
    /// The contract PolkaVM assembly code.
    #[serde(rename = "assembly")]
    pub assembly_text: Option<String>,

assembly_text (evm.assembly) is updated with polkavm assembly but assembly (evm.legacyAssembly) is still evm assembly

xermicus commented 1 week ago

Ah yeah, assembly_text is custom, it isn't a thing for solc.

xermicus commented 1 week ago

I reckon either assembly and legacyAssembly could as well be removed (i.e. replaced with the empty value). We will have maximum compat by keeping it. However I don't see what tools are supposed to do with this.

smiasojed commented 1 week ago

evm.assembly contains assembly_text, so I think it is good to keep it. However, Remix displays evm.legacyAssembly, which is incorrect. The options are to either include PolkaVM assembly there or leave it empty.

xermicus commented 1 week ago

Ah okay, I see where you are coming from. The problem is we are adding one more IR format. Remix is selling evm.legacAssembly as the "assembly" code. Which one would assume to be PVM assembly in our case. If we substitute it, we introduce some amount of confusion. But we have to do this anyways for the bytecode, in order to make toolings "just work". For me it's fine to substitute.

smiasojed commented 1 week ago

Remix parses this before displaying and expect JSON format. We can not just substitute it, because parsing will fail. In my opinion we have 2 options:

  1. empty object
  2. polkavm assembly in right format (Requires a change in polkavm-disassembler to write raw instructions)
xermicus commented 1 week ago

Why would 2. require a change in polkavm-disassembler instead bringing it to the right format directly in revive? Tying the PVM disassembler to this seems a bit stretched 😅

xermicus commented 1 week ago

In case the formats are inherently incompatible, we should change our remix instance to display what it finds in assembly_text.

xermicus commented 1 week ago

Stemming from an internal discussion:

* Provide what was requested; however

* Provide default values for incompatible data (if requested)

We substitute any byte-code incompatible fields with empty objects. This won't crash any tools relying on it, they can still request it but will get a zero or empty value. Which they can do nothing with so we should be save. Also this will make remix happy.

So if they are incompatible, seems like we hit the second case here. Replacing it with empty values and instead use assembly_text in our remix fork seems the way to go. Then we don't break any tools but our remix instance can still display it.

smiasojed commented 1 week ago

Why would 2. require a change in polkavm-disassembler instead bringing it to the right format directly in revive? Tying the PVM disassembler to this seems a bit stretched 😅

Now looking into it maybe we could indeed parse the polkavm assembly and output something similar to legacyAssembly, but I am not sure if it make sense.

Legacy assembly is built from objects containing Name: The opcode (e.g., ADD). in polkavm we get s0 = a0 + a1 so we need to parse the instructions and replace it with ADD Value: The operand or value pushed to the stack. In our case we could have registers [s0, a0, a1] Source: The source code location (related to Solidity source code). I think, we do not have it Begin and End: The bytecode offsets where the instruction starts and ends. We could get it from disassembler

xermicus commented 1 week ago

I vote for just using assembly_text directly in remix.

xermicus commented 1 week ago

@smiasojed is this issue still relevant?

smiasojed commented 1 week ago

@smiasojed is this issue still relevant?

yes, I need to do still something with evm.gasEstimates. I will handle it when I have some time.