immunant / IA2-Phase2

5 stars 0 forks source link

Add support for ARM64 using MTE #314

Open ayrtonm opened 9 months ago

ayrtonm commented 9 months ago

To add support for ARM64 we can use MTE and the x18 register in place of the PKRU on x64. Here's what we need to do

Testing on QEMU

Port libia2 to ARM

Rewriter and call gate generation

Modify LLVM to instrument memory accesses

We also need to modify LLVM to combine the tag in x18 with pointers before they're dereferenced. LLVM has two pass managers (legacy, new) and I'm not sure how well supported modifying the codegen pipeline with the new pass manager is, so it may be best to just make in-tree changes to LLVM.

Also LLVM has some support for MTE-based hardening, though tagging stack variables is the main thing that seems to be implemented and it's not what we need. Looking through AArch64StackTagging.cpp may be a good place to start though for seeing how to insert the instrumentation we'll need.

It seems both FastISel and GlobalISel an emit loads and stores so we'll probably need to modify both. I think masking the address is conceptually similar to zero/sign-extending registers so we can grep for ZExt for an idea of how to fix up addresses.

For testing we can start by looking at the resulting assembly for.

# can use stock clang for this
clang --target=aarch64-linux-gnu -S -emit-llvm foo.c
# needs to be built from our fork
llc -debug foo.llc

For building llvm run the cmake step with -DLLVM_TARGETS_TO_BUILD=AArch64 and we I think we just need to build the llc target, though the opt target may also come in handy if we need to implement our changes as an optimization pass (grep INITIALIZE_PASS_BEGIN for examples). As for the fork, our llvm-project repo is out of date since it was forked from UC Irvine's repo for mcad so make sure to build from upstream's tip of tree otherwise clang's llvm IR may not be compatible with the llc fork.

Open questions

ayrtonm commented 6 months ago

Running the binaries built from PR #323 showed that libia2.a tries to call pkey_alloc during initialization so it seems I'm missing a few things in the list for porting libia2 code to ARM.

ayrtonm commented 6 months ago

311 inadvertently turned into a stress test for the tracer (found another case of a procfs race fixed in #318 and the mmap issue in #324) so I'll clean up a few things and get it merged in spite of the remaining transient x86 CI issues. As for my next steps I think porting main to ARM properly might make the most sense.

ayrtonm commented 6 months ago

Note: PROT_MTE is only supported on MAP_ANONYMOUS and RAM-based file mappings (tmpfs, memfd). Passing it to other types of mapping will result in -EINVAL returned by these system calls.

https://docs.kernel.org/arch/arm64/memory-tagging-extension.html

I tried checking that accessing static data w/o a tagged ptr fails after enabling MTE with prctl, but I was unable to get a segfault. I also tweaked the prototype @sim-immunant made and it seems that mmap'ing with PROT_MTE works, mprotecting memory previously mmap'ed w/o PROT_MTE using the flag on the second syscall works, but I don't think mprotecting the loaded ELF segments works. mprotect actually returns 0 instead of -EINVAL but that's probably because linux is ignoring errors similar to this.

It might be possible to protect static data using some memfd tricks (or I guess mmaping over the ELF with MAP_FIXED...), but I'll have to think about what the simplest option for getting a test that triggers a fault running.

ayrtonm commented 6 months ago

TODO: add estimates. We should also consider

ayrtonm commented 6 months ago

Since we're already rebuilding glibc might be easier to modify this tunable instead of using PartitionAlloc again.

ayrtonm commented 6 months ago

can we put tag in x18 top byte to stay compatible with shadow call stack

I implemented some x18 switching in #333 and decided to just keep things simple for now and store the tag in bits 56-59 instead of the upper 4. We can revisit this after getting the LLVM fork

ayrtonm commented 6 months ago

So enabling MTE and testing a lot of functionality is blocked on the llvm fork since we expect all accesses to fail w/o it. However with permissive mode we should be able to see where programs fault w/o instrumenting memory accesses. sp-relative accesses won't fault, the heap isn't protected and file backed mappings (i.e. .data) won't be protected so I'm not sure what accesses will fault first other than the one in the PR enabling MTE and TLS probably. Regardless this means permissive mode is a bit more important than I realized since it'll let us make better use of the test suite (even if we need to change test_fault_handler.h a bit).

ayrtonm commented 5 months ago

Running the minimal test in QEMU w/ MTE violations patched out pointed out that the main wrapper asm doesn't actually tag registers before memory accesses. This also applies to the exit wrapper and ALLOCATE_STACK.

Since we're using fixed tags in this case we can use a addg (note that MTE must be enabled). In other cases where we actually do need to use the x18 tag (i.e. the LLVM instrumentation) we can use

extr R, R, x18, #56
extr R, R, R, #8

where R is the register used for the memory access and x18 has the tag in the lower 4 bits of the highest byte.

ayrtonm commented 4 months ago

Here I implemented tagging for str, ldr, stp and ldp which covers the most common instructions that access memory.

To see the remaining instructions run that repo's build.sh and grep for "MayStore|MayLoad" in build/lib/Target/AArch64/AArch64GenInstrInfo.inc. Each instruction has "def" operands followed by "use" operands and for each instruction we need to find the "use" operand that corresponds to the address being used. We should not need to tag "def" operands representing addresses. because those registers should be tagged before their use. To find the operand number look for the GPR64sp (64-bit general-puropse register or stack pointer) operand in the instruction's entry in OpcodeOperandTypes in that .inc file. If it has two GPR64sps it's probably a post/pre-index increment so we tag the latter. If it has more look at the .td files that generate that code (there's no generic patterns here).

Also we need to

  1. move our changes out of the dead register pass and into its own pass
  2. Possibly omit tagging when the reg is sp. It may be possible to tag sp in the call gates although I'm not sure if stack increments/decrements will clobber the tag.
  3. double check instructions marked as MayStore|MayLoad w/o a GPR64sp operand. So far I found load literals (LDRDl and friends) that do this but I think they don't actually need to be tagged.
ayrtonm commented 4 months ago

I tried building glibc with clang and shadow call stack but I don't think it's worth investing time into. scs requires -fno-exceptions which a lot of glibc explicitly requires. Even though gcc does support -fsanitize=shadow-call-stack it'll be an uphill battle. Since CFI isn't needed for MTE tagging to function (only to make sure it's secure), I think we should keep using the glibc prebuilt w/o CFI on QEMU and try to test on android w/ bionic as soon as possible to see what problems crop up when scs is enabled. AFAICT the way we use x18 should be compatible with scs but it's hard to be sure.

fw-immunant commented 1 month ago

We also need to recompile libgcc where e.g. __do_global_dtors_aux accesses statics (e.g. completed).