Implement Intel HLE and RTM intrinsics

gnzlbg commented 5 years ago

@mtak- mentioned _xbegin recently, we should probably expose them all via core::arch.

The Intel RTM (Restricted Transactional Memory) intrinsics (Intel Intrinsic Guide, clang header, CPUID: RTM, EAX=7, ECX=0: Extended Features, EBX=11):

void _xabort (const unsigned int imm8) (assert_instr(xabort), `llvm.x86.xabort)
unsigned int _xbegin (void) (assert_instr(xbegin), llvm.x86.xbegin, note: can return multiple times)
void _xend (void) (assert_instr(xend), llvm.x86.xend)
unsigned char _xtest (void) (assert_instr(xtest), llvm.x86.xtest)

We'd need to whitelist the rtm feature in rust-lang/rust as part of this.

I've asked on the LLVM bugzilla whether xbegin needs to be marked returns_twice: https://bugs.llvm.org/show_bug.cgi?id=41493, Craig Topper suggested on IRC to submit an LLVM patch to mark this intrinsic / clang intrinsic as returns_twice (cc @ctopper - hope I get the github id right), but @tnorthover mentioned that this might not be necessary or enough. We don't have to figure this out for the initial implementation, as @mtack- mentions, clang does not do this, but we should not forget about this issue.

TNorthover commented 5 years ago

I've done some more thinking, and I now don't think returns_twice is the right model at all. The only way the processor is getting back to the xbegin itself is via normal control flow that LLVM can see.

I think what we actually have is something akin to a call that might throw an exception. From LLVM's perspective, either the transaction eventually succeeds, in which case the xbegin has acted like a normal intrinsic call, or it fails, in which case execution proceeds from the landingpad as if nothing had happened.

Someone would need to investigate how well the existing landingpad actually fits in with what happens though. Key questions to answer will be

Current landingpads in use provide two values to the handler. One is usually (but not always) in rax so should be good; is the other just harmlessly undef, or will its presence break things? Is it even needed?
What registers are preserved through to the landingpad and how does LLVM know? It could either be special logic, or inherited from the assumptions about the preceding call.
I think landingpads need a personality function right now, but there isn't really one here. As far as I know it's only used for unwinding metadata, which we also don't need. So perhaps the real issue here is to make sure we don't try to generate that metadata for RTM invokes.

xabort should probably be noreturn, though I don't think that would affect correctness; it'll just allow more dead code elimination.

gnzlbg commented 5 years ago

cc @amanieu

—-

For the abort intrinsic to be noreturn it has to return the Never type in Rust (‘fn () -> !’).

Amanieu commented 5 years ago

If _xabort is used outside of a transaction then it acts as a no-op, so it shouldn't return !.

gnzlbg commented 5 years ago

Indeed, then it can't be noreturn.

mtak- commented 5 years ago

In case it's helpful here's swym-htm's llvm bindings: https://github.com/mtak-/swym/blob/b854eed11cc99b8551168934aeeb1f05ee1e04b2/swym-htm/src/x86_64.rs#L16

The signatures don't match the list above. Taken from llvm here (is there a better source?): https://github.com/llvm-mirror/llvm/blob/993ef0ca960f8ffd107c33bfbf1fd603bcf5c66c/test/CodeGen/X86/rtm.ll https://github.com/llvm-mirror/llvm/blob/993ef0ca960f8ffd107c33bfbf1fd603bcf5c66c/test/CodeGen/X86/xtest.ll

TNorthover commented 5 years ago

Oops, it looks like I didn't investigate just what @llvm.x86.xbegin did properly. It's a slightly higher level wrapper over the xbegin instruction that does seem to re-merge both success and failure paths into its return value.

So returns_twice is looking a lot more sensible again for that particular intrinsic now (though I still think an invoke-like interface would be more powerful and natural).

mtak- commented 5 years ago

@TNorthover What is special from an LLVM perspective about _xbegin?

IIUC xbegin is a lot like a read from volatile memory and then a branch based on the value read (was the transaction started or aborted). Even though we know it might take both branches, it's "as-if" only one were taken for any single call to _xbegin. Very much the same as the CPU speculating that it might take a certain branch, and then rolling that back when it realizes that it mispredicted the branch.

rust-lang / stdarch

Implement Intel HLE and RTM intrinsics #718