vyperlang / vyper

Pythonic Smart Contract Language for the EVM
https://vyperlang.org
Other
4.87k stars 796 forks source link

Compiled opcodes will return wrong values for PUSH instructions due to incorrect padding #4134

Open cyberthirst opened 6 months ago

cyberthirst commented 6 months ago

Submitted by obront.

Relevant GitHub Links

https://github.com/vyperlang/vyper/blob/52dc413c684532d5c4d6cdd91e3b058957cfcba0/vyper/compiler/output.py#L294-L312

Summary

When the compiler is run in -f opcodes or -f opcodes_runtime mode, it translates the final bytecode into opcodes. However, due to incorrect padding placed on PUSH values, the return values will be incorrect for any bytes with leading zeros.

Vulnerability Details

When the compiler is run with a target output of opcodes, we run the final bytecode through the following function:

def _build_opcodes(bytecode: bytes) -> str:
    bytecode_sequence = deque(bytecode)

    opcode_map = dict((v[0], k) for k, v in opcodes.get_opcodes().items())
    opcode_output = []

    while bytecode_sequence:
        op = bytecode_sequence.popleft()
        opcode_output.append(opcode_map.get(op, f"VERBATIM_{hex(op)}"))
        if "PUSH" in opcode_output[-1] and opcode_output[-1] != "PUSH0":
            push_len = int(opcode_map[op][4:])
            # we can have push_len > len(bytecode_sequence) when there is data
            # (instead of code) at end of contract
            # CMC 2023-07-13 maybe just strip known data segments?
            push_len = min(push_len, len(bytecode_sequence))
            push_values = [hex(bytecode_sequence.popleft())[2:] for i in range(push_len)]
            opcode_output.append(f"0x{''.join(push_values).upper()}")

    print(opcode_output)
    return " ".join(opcode_output)

This function iterates through each instruction in the bytecode and translates it to the corresponding opcode. In the case of PUSH instructions, it parses the number of bytes to include (let's call it x), and then assumes the following x instructions are the value passed to PUSH.

For each of these two byte chunks, it parses the bytes with hex(bytecode_sequence.popleft())[2:] and joins them together.

The problem is that for two bytes that begin with a leading 0 (such as 0x05), this simply appends the non-zero digit to the sequence. The result is a sequence that is not as long as expected by the PUSH instruction, and therefore is prepended (or appended, depending on the type) with 0s in order to reach the expected length.

Proof of Concept

Consider the following Vyper contract, with a single function that returns a bytes4 value of 0x350f872d:

@external
def f1() -> bytes4:
    return 0x350f872d

Because the second byte of the return value starts with a 0, the translation will return 0xf instead of 0x0f.

The result is are these incorrect opcodes returned from the compiler (see the PUSH32 instruction in the middle):

PUSH0 CALLDATALOAD PUSH1 0xE0 SHR PUSH4 0xC27FC35 DUP2 XOR PUSH2 0x03E JUMPI CALLVALUE PUSH2 0x042 JUMPI PUSH32 0x35F872D0000000000000000000000000000 PUSH1 0x40 MSTORE PUSH1 0x20 PUSH1 0x40 RETURN JUMPDEST PUSH0 PUSH0 REVERT JUMPDEST PUSH0 DUP1 REVERT

Impact

The compiler will return incorrect values when run in opcode mode and there is any PUSH instruction that includes bytes with leading zeros.

Tools Used

Manual Review

Recommendations

Ensure that the push_values value is padded to be two digits before being joined into a bytestring.

trocher commented 6 months ago

I think it has already been fixed: https://github.com/vyperlang/vyper/pull/3735