When the compiler is run in -f opcodes or -f opcodes_runtime mode, it translates the final bytecode into opcodes. However, due to incorrect padding placed on PUSH values, the return values will be incorrect for any bytes with leading zeros.
Vulnerability Details
When the compiler is run with a target output of opcodes, we run the final bytecode through the following function:
def _build_opcodes(bytecode: bytes) -> str:
bytecode_sequence = deque(bytecode)
opcode_map = dict((v[0], k) for k, v in opcodes.get_opcodes().items())
opcode_output = []
while bytecode_sequence:
op = bytecode_sequence.popleft()
opcode_output.append(opcode_map.get(op, f"VERBATIM_{hex(op)}"))
if "PUSH" in opcode_output[-1] and opcode_output[-1] != "PUSH0":
push_len = int(opcode_map[op][4:])
# we can have push_len > len(bytecode_sequence) when there is data
# (instead of code) at end of contract
# CMC 2023-07-13 maybe just strip known data segments?
push_len = min(push_len, len(bytecode_sequence))
push_values = [hex(bytecode_sequence.popleft())[2:] for i in range(push_len)]
opcode_output.append(f"0x{''.join(push_values).upper()}")
print(opcode_output)
return " ".join(opcode_output)
This function iterates through each instruction in the bytecode and translates it to the corresponding opcode. In the case of PUSH instructions, it parses the number of bytes to include (let's call it x), and then assumes the following x instructions are the value passed to PUSH.
For each of these two byte chunks, it parses the bytes with hex(bytecode_sequence.popleft())[2:] and joins them together.
The problem is that for two bytes that begin with a leading 0 (such as 0x05), this simply appends the non-zero digit to the sequence. The result is a sequence that is not as long as expected by the PUSH instruction, and therefore is prepended (or appended, depending on the type) with 0s in order to reach the expected length.
Proof of Concept
Consider the following Vyper contract, with a single function that returns a bytes4 value of 0x350f872d:
@external
def f1() -> bytes4:
return 0x350f872d
Because the second byte of the return value starts with a 0, the translation will return 0xf instead of 0x0f.
The result is are these incorrect opcodes returned from the compiler (see the PUSH32 instruction in the middle):
Submitted by obront.
Relevant GitHub Links
https://github.com/vyperlang/vyper/blob/52dc413c684532d5c4d6cdd91e3b058957cfcba0/vyper/compiler/output.py#L294-L312
Summary
When the compiler is run in
-f opcodes
or-f opcodes_runtime
mode, it translates the final bytecode into opcodes. However, due to incorrect padding placed onPUSH
values, the return values will be incorrect for any bytes with leading zeros.Vulnerability Details
When the compiler is run with a target output of opcodes, we run the final bytecode through the following function:
This function iterates through each instruction in the bytecode and translates it to the corresponding opcode. In the case of
PUSH
instructions, it parses the number of bytes to include (let's call itx
), and then assumes the followingx
instructions are the value passed toPUSH
.For each of these two byte chunks, it parses the bytes with
hex(bytecode_sequence.popleft())[2:]
and joins them together.The problem is that for two bytes that begin with a leading
0
(such as0x05
), this simply appends the non-zero digit to the sequence. The result is a sequence that is not as long as expected by the PUSH instruction, and therefore is prepended (or appended, depending on the type) with 0s in order to reach the expected length.Proof of Concept
Consider the following Vyper contract, with a single function that returns a bytes4 value of
0x350f872d
:Because the second byte of the return value starts with a 0, the translation will return
0xf
instead of0x0f
.The result is are these incorrect opcodes returned from the compiler (see the
PUSH32
instruction in the middle):Impact
The compiler will return incorrect values when run in opcode mode and there is any PUSH instruction that includes bytes with leading zeros.
Tools Used
Manual Review
Recommendations
Ensure that the
push_values
value is padded to be two digits before being joined into a bytestring.