Reisen commented 4 years ago

Attack Diary: Attempting to Bypass Sigverify

I was asked to try and document some of my attempts to break Solana. Here's the first! A lot of this was a little while ago so it may be missing something as my memory is pretty crappy. I'll try and write these when they're fresher in mind from now on though.

After the initial fix to signatures was added for the first SOL theft, I tried to break it in other ways. Here's a log of where I attempted to break sigverify and why my attempts didn't work.

Goal

If I can convince the sigverify step to approve transactions that don't verify I can fabricate transactions.

Attempts

Approach 1

This was the most time consuming because I ended up reviewing bincode's source in my code dive.

Looking at the code for sigverify I notice that it does not deserialize the packet, instead it manually decodes each bit of data using offsets and calculated lengths. I figure if I can find a way to create data that deserializes differently through serde then the hope is I can find a situation where I can trick sigverify into passing data that then deserializes to something unexpected, such as a larger num_required_signatures.

Bincode goes through a lot of effort to directly encode data as close to "as-is" as possible so actually sigverifies manual reading ends up being really solid. I doubt that I will find anything, but I dig anyway.

Bincode defaults to using varint encoding for numbers. I wonder if, if I can get a varint encoding in my packet, could I get sigverify to read the num_required_signatures as smaller than the deserialized number? If I can then sigverify would ignore a signature that the MessageProcessor correctly reads.

Answer: Nope, all the integers in MessageHeader are u8's and therefore are written directly as single bytes. Dead end. If the value was a u16 then I might have had some luck here if sigverify was negligent about it.

To be extra sure though, I went through the source of bincode to see if there was anything in the way it encodes that could be manipulated to convince it to decode something different during the bank pipelines. Perhaps skip a byte or some kind of metadata that it would ignore while deserializing but trip up sigverify. No luck. Not surprised though. On the plus side learnt a lot about how serde works.

Approach 2

What if I can convince sigverify to see a smaller list of signatures than the bank pipeline? This was a quick one, both pieces of code decode the signature list length with the same ShortVec deserializer and there's no real trickery there to be done that I can see. I wondered if I could convince ShortVec to return less bytes than it actually read to get a weird offset to propogate during the offset calculation but there's nothing really here.

Approach 3

What if I make the signature list long enough that it reads data from the next transaction in memory? Transactions (Packets) are stored in a PinnedVec in contiguous memory so perhaps there's a way to abuse the boundary between received transactions.

In the end this was another relatively quick dead-end for several reasons: limited_deserialize is the function used to pull out transactions from a Packet and this uses bincodes limit set to the size of the data field. While sigverify won't care, it does mean creating a signature list long enough to overrun the end of the packet will get me nothing as the data will never enter the bank pipeline. sigverify also checks that the max size of the packet is not smaller than the sum of the signature list so any large list will be thwarted pretty fast.

In the end I can't spot any way to get sigverify to read memory out of bounds of the packet. I only checked the CPU path, though the same offset calculations are performed and sent to the GPU from what I can tell.

Approach 4

Do any of the instructions have any weaknesses in how they check signatures? I look through each of the native functions and notice there are two different methods currently used:

1) KeyedAccount::signer_key().is_none() 2) Address::is_signer(&HashSet<PubKey>)

The first checks if the KeyedAccount is flagged as a signer, this is set by code in MessageProcessor which I haven't found a way to break.

The second works by checking if the key is part of the set of signers, which is populated by a call to get_signers defined in the sdk. That method just loops over accounts and also calls .signer_key() so the result is the same as 1).

_Side Note: Also noticed here that limited_deserialize is used to parse instruction data. Thought this was a little odd as the limit is set to the packet size. Unsure if this should be a new deserialize with a smaller limit or not._

Side Note 2: While digging through this code I noticed u128 is used for the pre/post checks for lamport balances after an instruction is processed, so the hope for an overflow trick to convince it that no money had been moved when actually moving billions was off the table.

There were other things I tried but I can't remember them well enough now to write up what my dead ends were.

aeyakovenko commented 4 years ago

@garious @mvines Should convert this into comments in the code that mark the critical sections where we do the checks, or a doc that is part of the book?

ryoqun commented 4 years ago

@Reisen That's was fun read. :) Thanks for sharing. Btw, did you use some tool to dig the code? rust-analyzer is pretty decent these days. It's really better alternative than rls. That tool provides smooth code navigation and it helps a ton when reading non-familiar code for me. :)

Reisen commented 4 years ago

Just RLS and generated ctags for vim. I've not tried rust-analyzer though I definitely shall. Thanks for the tip :)

ryoqun commented 2 years ago

https://medium.com/chorus-one/reisen-hood-prince-of-crypto-thieves-72e3bb5866d1

our pr: https://github.com/solana-labs/solana/pull/8596

solana-labs / solana