shazow / whatsabi

Extract the ABI (and resolve proxies, and get other metadata) from Ethereum bytecode, even without source code.
https://shazow.github.io/whatsabi/
MIT License
1.04k stars 71 forks source link

disasm: Return values #13

Open shazow opened 1 year ago

shazow commented 1 year ago

Unfortunately selector hashes don't include the return value, so none of the 4byte databases include return types.

Questions:

  1. How do we detect whether a function has a return value at all?
  2. If it does, can we do anything to guess the type or size?

What we have:

Updated challenges:

shazow commented 1 year ago

I'm assuming the RETURN opcode with non-zero size will indicate if a function returns a value, but relying on that means we'd need to construct instruction ranges for each function (should be possible assuming the selector table yields back-to-back functions). [Update: This looks fine]

On the upside, that should be sufficient to give us the return size, which is often a good proxy for guessing what the type is (e.g. 160 bits -> probably address). [Update: This is false]

peetzweg commented 1 year ago

Using the dummy output value of [{type:"byte32"}] seem to work to get at least a "readable" value for uint256 and address types. string, gets butchered and probably tuples etc. as well.

shazow commented 1 year ago

If a function returns a size that is larger than bytes32, what's a good strategy for returning an undecoded type to fit it? Like say it's 32+16+32 = 80 bytes (but we don't know the layout, we just see 80 bytes). Naive approach feels like returning 32,32,16 (basically binpacking from largest to smallest). Is there something better we could do?

Or maybe it's better to just use string type for anything >32?

shazow commented 1 year ago

Started a WIP PR in #14, here are the vibes so far (from PR):

Still in the research phase, trying to find a way to detect output sizes but that's looking harder than I hoped.

It looks like modern solidity wraps most outputs through a chain of jumps that prepares the data. It's going to be quite hard to do this with a single-pass static analysis.

Older solidity (e.g. WETH contract with v0.4.x) does a simpler return macro per function window, those aren't hard to detect but extracting sizing reliably still seems hard.

Also I thought it'd be easier to detect address type outputs because they're 20 bytes rather than the usual 32, but I forgot that things get padded so it still ends up being 32 bytes.

I probably need to sleep on this in case there's other clever solutions but not looking great for single-pass static analysis right now. 😅

shazow commented 1 year ago

Updated the current state and challenges in the issue description, going to pass it around to some folks to see if anyone else has ideas. Feel free to re-share. :)

shazow commented 1 year ago

I just merged a branch which does more advanced static analysis into master, haven't done a release yet.

In some cases, it manages to successfully guess whether there are inputs or outputs (not super reliable, I'd say like... 60%?), but there have been major changes behind the scenes with how the static analysis works so we can do more advanced things moving forward.

Also we now have stateMutability included in the ABI, which is reliable in detecting payable functions, but not reliable in distinguishing nonpayable vs view yet.

Would appreciate some testing and feedback before I do a proper release. :)

shazow commented 1 year ago

Next release issue is here: https://github.com/shazow/whatsabi/issues/18