BPF Loader Parameter Alignment incorrect on aarch64/M1 Processors

Problem

If you natively compile Solana (read: if you don't run everything under rosetta) on an M1-based Mac and run the test validator it will boot and process basic transactions. If you then try to deploy a BPF program and run it, depending on the account sizes passed in, the program_id and instruction data will be incorrectly deserialized (to the wrong values), i.e. lots of this:

Program 11111111111111111111111111111111 failed: incorrect program id for instruction

After a fair amount of surgery, I've narrowed this down to the following line: https://github.com/solana-labs/solana/blob/d9b0fc0e3eec67dfe4a97d9298b15969b2804fab/programs/bpf_loader/src/serialization.rs#L245

This is attempting to align the next parameter (the Rent Epoch for an account) to an 8 bite boundary but align_of::<u128> is ABI specific and thus on an x86_64 machine this call will return 8, on an aarch64 machine will return 16 and on [whatever target BPF is] returns 8.

Take this code in a simple rust project:

use std::mem::align_of;

pub fn main() {
    println!("Alignment: {}", align_of::<u128>())
}

Run it under both ABIs

ben@Bens-MacBook-Pro alignment % cargo run --target x86_64-apple-darwin
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/x86_64-apple-darwin/debug/alignment`
Alignment: 8
ben@Bens-MacBook-Pro alignment % cargo run                             
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/alignment`
Alignment: 16

So to summarize: the BPF interface assumes everything is running on a certain set of implicit intrinsic alignment rules which vary across architectures. When those architectures change, padding and alignment changes and things get very confused.

Proposed Solution

I'll work on a simple diff which extracts this alignment parameter into a const (defined as 8) so that it's stable across boundaries. This will fix the BPF Loader parameter serialization.

Looking elsewhere in the codebase for align_of it appears it is used to align memory passed in syscalls. So this code will also likely fail (once a sufficiently complicated eBPF calls a syscall the right "wrong" way) because eBPF will align something to (for example) 8 whereas aarch64 will expect 16 and explode on this line:

https://github.com/solana-labs/solana/blob/7200c5106e0bb77b11ace5e0faffd926081535c8/programs/bpf_loader/src/syscalls.rs#L471

It's less clear what the right answer here is given that the align_of template parameter is different all over the place (although mostly confined to this one syscall file). It feels like the syscall interface should be binary stable regardless of host platform which would necessitate a way to derive alignments that is either based on x86_64 alignment in all scenarios (since that's what the de facto rule is) or explicitly defined elsewhere.

I don't see an easy way to derive the x86_64 ABI alignment from an aarch64 target environment so one could either (1) go look it up and either code a lookup table or the logic that defines it in a separate file that will work the same everywhere or (2) just disable the syscall alignment check conditionally when running on an unsupported host target.

Thoughts welcome.

solana-labs / solana

BPF Loader Parameter Alignment incorrect on aarch64/M1 Processors #21269

Problem

Proposed Solution