Tracking issue for `asm` (inline assembly)

aturon commented 8 years ago

This issue tracks stabilization of inline assembly. The current feature has not gone through the RFC process, and will probably need to do so prior to stabilization.

bstrie commented 8 years ago

Will there be any difficulties with ensuring the backward-compatibility of inline assembly in stable code?

bstrie commented 8 years ago

@main-- has a great comment at https://github.com/rust-lang/rfcs/pull/1471#issuecomment-173982852 that I'm reproducing here for posterity:

With all the open bugs and instabilities surrounding asm!() (there's a lot), I really don't think it's ready for stabilization - even though I'd love to have stable inline asm in Rust.

We should also discuss whether today's asm!() really is the best solution or if something along the lines of RFC #129 or even D would be better. One important point to consider here is that asm() does not support the same set of constraints as gcc. Therefore, we can either:

Stick to the LLVM behavior and write docs for that (because I've been unable to find any). Nice because it avoids complexity in rustc. Bad because it will confuse programmers coming from C/C++ and because some constraints might be hard to emulate in Rust code.

Emulate gcc and just link to their docs: Nice because many programmers already know this and there's plenty of examples one can just copy-paste with little modifications. Bad because it's a nontrivial extension to the compiler.

Do something else (like D does): A lot of work that may or may not pay off. If done right, this could be vastly superior to gcc-style in terms of ergonomics while possibly integrating more nicely with language and compiler than just an opaque blob (lots of handwaving here as I'm not familiar enough with compiler internals to assess this).

Finally, another thing to consider is #1201 which in its current design (I think) depends quite heavily on inline asm - or inline asm done right, for that matter.

briansmith commented 8 years ago

I personally think it would be better to do what Microsoft did in MSVC x64: define a (nearly-)comprehensive set of intrinsic functions, for each asm instruction, and do "inline asm" exclusively through those intrinsics. Otherwise, it's very difficult to optimize the code surrounding inline asm, which is ironic since many uses of inline asm are intended to be performance optimizations.

One advantage of the instrinsic-based approach is that it doesn't need to be an all-or-nothing thing. You can define the most needed intrinsics first, and build the set out incrementally. For example, for crypto, having _addcarry_u64, _addcarry_u32. Note that the work to do the instrinsics seems to have been done quite thoroughly already: https://github.com/huonw/llvmint.

Further, the intrinsics would be a good idea to add even if it were ultimately decided to support inline asm, as they are much more convenient to use (based on my experience using them in C and C++), so starting with the intrinsics and seeing how far we get seems like a zero-risk-of-being-wrong thing.

cuviper commented 8 years ago

Intrinsics are good, but asm! can be used for more than just inserting instructions. For example, see the way I'm generating ELF notes in my probe crate. https://github.com/cuviper/rust-libprobe/blob/master/src/platform/systemtap.rs

I expect that kind of hackery will be rare, but I think it's still a useful thing to support.

arielb1 commented 8 years ago

@briansmith

Inline asm is also useful for code that wants to do its own register/stack allocation (e.g. naked functions).

Ericson2314 commented 8 years ago

@briansmith yeah those are some excellent reasons to use intrinsics where possible. But it's nice to have inline assembly as the ultimate excape hatch.

main-- commented 8 years ago

@briansmith Note that asm!() is kind of a superset of intrinsics as you can build the latter using the former. (The common argument against this reasoning is that the compiler could theoretically optimize through intrinsics, e.g. hoist them out of loops, run CSE on them, etc. However, it's a pretty strong counterpoint that anyone writing asm for optimization purposes would do a better job at that than the compiler anyways.) See also https://github.com/rust-lang/rust/issues/29722#issuecomment-207628164 and https://github.com/rust-lang/rust/issues/29722#issuecomment-207823543 for cases where inline asm works but intrinsics don't.

On the other hand, intrinsics critically depend on a "sufficiently smart compiler" to achieve at least the performance one would get with a hand-rolled asm implementation. My knowledge on this is outdated but unless there has been significant progress, intrinsics-based implementations are still measurably inferior in many - if not most - cases. Of course they're much more convenient to use but I'd say that programmers really don't care much about that when they're willing to descend into the world of specific CPU instructions.

Now another interesting consideration is that intrinsics could be coupled with fallback code on architectures where they're not supported. This gives you the best of both worlds: Your code is still portable - it can just employ some hardware accelerated operations where the hardware supports them. Of course this only really pays off for either very common instructions or if the application has one obvious target architecture. Now the reason why I'm mentioning this is that while one could argue that this may potentially even be undesirable with compiler-provided intrinsics (as you'd probably care about whether you actually get the accelerated versions plus compiler complexity is never good) I'd say that it's a different story if the intrinsics are provided by a library (and only implemented using inline asm). In fact, this is the big picture I'd prefer even though I can see myself using intrinsics more than inline asm.

(I consider the intrinsics from RFC #1199 somewhat orthogonal to this discussion as they exist mostly to make SIMD work.)

jimblandy commented 8 years ago

@briansmith

Otherwise, it's very difficult to optimize the code surrounding inline asm, which is ironic since many uses of inline asm are intended to be performance optimizations.

I'm not sure what you mean here. It's true that the compiler can't break down the asm into its individual operations to do strength reduction or peephole optimizations on it. But in the GCC model, at least, the compiler can allocate the registers it uses, copy it when it replicates code paths, delete it if it's never used, and so on. If the asm isn't volatile, GCC has enough information to treat it like any other opaque operation like, say, fsin. The whole motivation for the weird design is to make inline asm something the optimizer can mess with.

But I haven't used it a whole lot, especially not recently. And I have no experience with LLVM's rendition of the feature. So I'm wondering what's changed, or what I've misunderstood all this time.

alexcrichton commented 7 years ago

We discussed this issue at the recent work week as @japaric's survey of the no_std ecosystem has the asm! macro as one of the more commonly used features. Unfortunately we didn't see an easy way forward for stabilizing this feature, but I wanted to jot down the notes we had to ensure we don't forget all this.

First, we don't currently have a great specification of the syntax accepted in the asm! macro. Right now it typically ends up being "look at LLVM" which says "look at clang" which says "look at gcc" which doesn't have great docs. In the end this typically bottoms out at "go read someone else's example and adapt it" or "read LLVM's source code". For stabilization a bare minimum is that we need to have a specification of the syntax and documentation.
Right now, as far as we know, there's no stability guarantee from LLVM. The asm! macro is a direct binding to what LLVM does right now. Does this mean that we can still freely upgrade LLVM when we'd like? Does LLVM guarantee it'll never ever break this syntax? A way to alleviate this concern would be to have our own layer that compiles to LLVM's syntax. That way we can change LLVM whenever we like and if the implementation of inline assembly in LLVM changes we can just update our translation to LLVM's syntax. If asm! is to become stable we basically need some mechanism of guaranteeing stability in Rust.
Right now there are quite a few bugs related to inline assembly. The A-inline-assembly tag is a good starting point, and it's currently littered with ICEs, segfaults in LLVM, etc. Overall this feature, as implemented today, doesn't seem to live up to the quality guarantees others expect from a stable feature in Rust.
Stabilizing inline assembly may make an implementation of an alternate backend very difficult. For example backends such as miri or cranelift may take a very long time to reach feature parity with the LLVM backend, depending on the implementation. This may mean that there's a smaller slice of what can be done here, but it's something important to keep in mind when considering stabilizing inline assembly.

Despite the issues listed above we wanted to be sure to at least come away with some ability to move this issue forward! To that end we brainstormed a few strategies of how we can nudge inline assembly towards stabilization. The primary way forward would be to investigate what clang does. Presumably clang and C have effectively stable inline assembly syntax and it may be likely that we can just mirror whatever clang does (especially wrt LLVM). It would be great to understand in greater depth how clang implements inline assembly. Does clang have its own translation layer? Does it validate any input parameters? (etc)

Another possibility for moving forward is to see if there's an assembler we can just take off the shelf from elsewhere that's already stable. Some ideas here were nasm or the plan9 assembler. Using LLVM's assembler has the same problems about stability guarantees as the inline assembly instruction in the IR. (it's a possibility, but we need a stability guarantee before using it)

Amanieu commented 7 years ago

I would like to point out that LLVM's inline asm syntax is different from the one used by clang/gcc. Differences include:

LLVM uses $0 instead of %0.
LLVM doesn't support named asm operands %[name].
LLVM supports different register constraint types: for example "{eax}" instead of "a" on x86.
LLVM support explicit register constraints ("{r11}"). In C you must instead use register asm variables to bind a value to a register (register asm("r11") int x).
LLVM "m" and "=m" constraints are basically broken. Clang translates these into indirect memory constraints "*m" and "=*m" and pass the address of the variable to LLVM instead of the variable itself.
etc...

Clang will convert inline asm from the gcc format into the LLVM format before passing it on to LLVM. It also performs some validation of the constraints: for example it ensures that "i" operands are compile-time constants,

In light of this I think that we should implement the same translation and validation that clang does and support proper gcc inline asm syntax instead of the weird LLVM one.

alexcrichton commented 7 years ago

There's an excellent video about summaries with D, MSVC, gcc, LLVM, and Rust with slides online

jcranmer commented 7 years ago

As someone who'd love to be able to use inline ASM in stable Rust, and with more experience than I want trying to access some of the LLVM MC APIs from Rust, some thoughts:

Inline ASM is basically a copy-paste of a snippet of code into the output .s file for assembling, after some string substitution. It also has attachments of input and output registers as well as clobbered registers. This basic framework is unlikely to ever really change in LLVM (although some of the details might vary slightly), and I suspect that this is a fairly framework-independent representation.
Constructing a translation from a Rust-facing specification to an LLVM-facing IR format isn't hard. And it might be advisable--the rust {} syntax for formatting doesn't interfere with assembly language, unlike LLVM's $ and GCCs % notation.
LLVM does a surprisingly bad job in practice of actually identifying which registers get clobbered, particularly in instructions not generated by LLVM. This means it's pretty much necessary for the user to manually specify which registers get clobbered.
Trying to parse the assembly yourself is likely to be a nightmare. The LLVM-C API doesn't expose the MCAsmParser logic, and these classes are quite annoying to get working with bindgen (I've done it).
For portability to other backends, as long as you keep the inline assembly mostly on the level of "copy-paste this string with a bit of register allocation and string substitution", it shouldn't inhibit backends all that much. Dropping the integer constant and memory constraints and keeping just register bank constraints shouldn't pose any problems.

parched commented 7 years ago

I've been having a bit of play to see what can be done with procedural macros. I've written one that converts GCC style inline assembly to rust style https://github.com/parched/gcc-asm-rs. I've also started working on one that uses a DSL where the user doesn't have to understand the constraints and they're all handled automatically.

So I've come to the conclusion that I think rust should just stabilise the bare building blocks, then the community can iterate out of tree with macros to come up with best solutions. Basically, just stabilise the llvm style we have now with only "r" and "i" and maybe "m" constraints, and no clobbers. Other constraints and clobbers can be stabilised later with their own mini rfc type things.

bstrie commented 6 years ago

Personally I'm starting to feel as though stabilizing this feature is the sort of massive task that will never get done unless somehow someone hires a full-time expert contractor to push on this for a whole year. I want to believe that @parched's suggestion of stabilizing asm! piecemeal will make this tractable. I hope someone picks it up and runs with it. But if it isn't, then we need to stop trying to reach for the satisfactory solution that will never arrive and reach for the unsatisfactory solution that will: stabilize asm! as-is, warts, ICEs, bugs and all, with bright bold warnings in the docs advertising the jank and nonportability, and with the intent to deprecate someday if a satisfactory implementation should ever miraculously descend, God-sent, on its heavenly host. IOW, we should do exactly what we did for macro_rules! (and of course, just like for macro_rules!, we can have a brief period of frantic band-aiding and leaky future-proofing). I'm sad at the ramifications for alternative backends, but it's shameful for a systems language to relegate inline assembly to such a limbo, and we can't let the hypothetical possibility of multiple backends continue to obstruct the existence of one actually usable backend. I beg of you, prove me wrong!

main-- commented 6 years ago

it's shameful for a systems language to relegate inline assembly to such a limbo

As a data point, I happen to be working on a crate right now that depends on gcc for the sole purpose of emitting some asm with stable Rust: https://github.com/main--/unwind-rs/blob/266e0f26b6423f4a2b8a8c72442b319b5c33b658/src/unwind_helper.c

While it certainly has its advantages, I'm a bit wary of the "stabilize building blocks and leave the rest to proc-macros"-approach. It essentially outsources the design, RFC and implementation process to whoever wants to do the job, potentially no one. Of course having weaker stability/quality guarantees is the entire point (the tradeoff is that having something imperfect is already much better than having nothing at all), I understand that.

At least the building blocks should be well-designed - and in my opinion, "expr" : foo : bar : baz definitely isn't. I can't remember ever getting the order right on the first try, I always have to look it up. "Magic categories separated by colons where you specify constant strings with magic characters that end up doing magic things to the variable names that you also just mash in there somehow" is just bad.

nbp commented 6 years ago

One idea, …

Today, there is already a project, named dynasm, which can help you generate assembly code with a plugin used to pre-process the assembly with one flavor of x64 code.

This project does not answer the problem of inline assembly, but it can certainly help, if rustc were to provide a way to map variables to registers, and accept to insert set of bytes in the code, such project could also be used to fill-up these set of bytes.

This way, the only standardization part needed from rustc point of view, is the ability to inject any byte sequence in the generated code, and to enforce specific register allocations. This removes all the choice for specific languages flavors.

Even without dynasm, this can also be used as a way to make macros for the cpuid / rtdsc instructions, which would just be translated into the raw sequence of bytes.

I guess the next question might be if we want to add additional properties/constraints to the byte-sequences.

jimblandy commented 6 years ago

[EDIT: I don't think anything I said in this comment is correct.]

If we want to continue to use LLVM's integrated assembler (I assume this is faster than spawning an external assembler), then stabilization means stabilizing on exactly what LLVM's inline assembly expressions and integrated assembler support—and compensating for changes to those, should any occur.

If we're willing to spawn an external assembler, then we can use any syntax we want, but we're then foregoing the advantages of the integrated assembler, and exposed to changes in whatever external assembler we're calling.

cuviper commented 6 years ago

I think it would be strange to stabilize on LLVM's format when even Clang doesn't do that. Presumably it does use LLVM's support internally, but it presents an interface more like GCC.

bstrie commented 6 years ago

I'm 100% fine with saying "Rust supports exactly what Clang supports" and calling it a day, especially since AFAIK Clang's stance is "Clang supports exactly what GCC supports". If we ever have a real Rust spec, we can soften the language to "inline assembly is implementation-defined". Precedence and de-facto standardization are powerful tools. If we can repurpose Clang's own code for translating GCC syntax to LLVM, all the better. The alternative backend concerns don't go away, but theoretically a Rust frontend to GCC wouldn't be much vexed. Less for us to design, less for us to endlessly bikeshed, less for us to teach, less for us to maintain.

jimblandy commented 6 years ago

If we stabilize something defined in terms of what clang supports, then we should call it clang_asm!. The asm! name should be reserved for something that's been designed through a full RFC process, like other major Rust features. #bikeshed

There are a few things I'd like to see in Rust inline assembly:

The template-with-substitutions pattern is ugly. I'm always jumping back and forth between the assembly text and the constraint list. Brevity encourages people to use positional parameters, which makes legibility worse. Symbolic names often mean you have the same name repeated three times: in the template, naming the operand, and in the expression being bound to the operand. The slides mentioned in Alex's comment show that D and MSVC let you simply reference variables in the code, which seems much nicer.
Constraints are both hard to understand, and (mostly) redundant with the assembly code. If Rust had an integrated assembler with a sufficiently detailed model of the instructions, it could infer the constraints on the operands, removing a source of error and confusion. If the programmer needs a specific encoding of the instruction, then they would need to supply an explicit constraint, but this would usually not be necessary.

Norman Ramsey and Mary Fernández wrote some papers about the New Jersey Machine Code Toolkit way back when that have excellent ideas for describing assembly/machine language pairs in a compact way. They tackle (Pentium Pro-era) iA-32 instruction encodings; it is not at all limited to neat RISC ISAs.

alexcrichton commented 6 years ago

I'd like to reiterate again the conclusions from the most recent work week:

Today, as far as we know, there's basically no documentation for this feature. This includes LLVM internals and all.
We have, as far as we know, no guarantee of stability from LLVM. For all we know the implementation of inline assembly in LLVM could change any day.
This is, currently, a very buggy feature in rustc. It's chock full of (at compile time) segfaults, ICEs, and weird LLVM errors.
Without a specification it's nigh impossible to even imagine an alternate backend for this.

To me this is the definition of "if we stabilize this now we will guarantee to regret it in the future", and not only "regret it" but seems very likely for "causes serious problems to implement any new system".

At the absolute bare minimum I'd firmly believe that bullet (2) cannot be compromised on (aka the definition of stable in "stable channel"). The other bullets would be quite sad into forgo as it erodes the expected quality of the Rust compiler which is currently quite high.

jimblandy commented 6 years ago

@jcranmer wrote:

LLVM does a surprisingly bad job in practice of actually identifying which registers get clobbered, particularly in instructions not generated by LLVM. This means it's pretty much necessary for the user to manually specify which registers get clobbered.

I would think that, in practice, it would be quite difficult to infer clobber lists. Just because a machine-language fragment uses a register doesn't mean it clobbers it; perhaps it saves it and restores it. Conservative approaches could discourage the code generator from using registers that would be fine to use.

jcranmer commented 6 years ago

@alexcrichton wrote:

We have, as far as we know, no guarantee of stability from LLVM. For all we know the implementation of inline assembly in LLVM could change any day.

The LLVM docs guarantee "Newer releases can ignore features from older releases, but they cannot miscompile them." (with respect to IR compatibility). That rather constrains how much they can change inline assembly, and, as I argued above, there's not really any viable replacement at LLVM level that would radically change semantics from the current situation (unlike, say, the ongoing issues around poison and undef). Saying that its prospective instability precludes using it as a base for a Rust asm! block is therefore somewhat dishonest. Now that's not to say there are other problems with it (poor documentation, although that has improved; constraint suckiness; poor diagnostics; and bugginess in less-common scenarios are ones that come to mind).

My biggest worry in reading the thread is that we make the perfect be the enemy of the good. In particular, I worry that searching for some magic DSL intermediary is going to take a few years to try to wrangle into usable form for inline-asm as people discover that integrating asm parsers and trying to get them to work with LLVM's cause more problems in edge cases.

jimblandy commented 6 years ago

Are LLVM really guaranteeing that they'll never miscompile a feature whose behavior they've never specified? How would they even decide if a change was a miscompilation or not? I could see it for the other parts of the IR, but this seems like a lot to expect.

parched commented 6 years ago

I think it would be strange to stabilize on LLVM's format when even Clang doesn't do that.

Clang doesn't do that because it aims to be able to compile code that was written for GCC. rustc doesn't have that aim. The GCC format isn't very ergonomic so ultimately I think we don't want that, but whether that would be better to go with for now I'm unsure. There's a lot of (nightly) code out there using the current Rust format that would break if we changed to GCC style so it's probably only worth changing if we can come up with something notably better.

At least the building blocks should be well-designed - and in my opinion, "expr" : foo : bar : baz definitely isn't.

Agreed. At the very least I prefer the raw LLVM format where the constraints and clobbers are all in one list. Currently there is a redundancy having to specify "=" prefix and put it in the output list. I also think how LLVM treats it more like a function call where the outputs are the result of the expression, AFAIK the current asm! implementation is the only part of rust that has "out" parameters.

LLVM does a surprisingly bad job in practice of actually identifying which registers get clobbered

AFAIK LLVM doesn't event try to do this as the main reason for inline assembly is to include some code that LLVM doesn't understand. It only does register allocation and template substitution without looking at the actual assembly. (Obviously it parses the actually assembly at some stage to generate the machine code, but I think that happens later)

If we're willing to spawn an external assembler

I'm not sure there can ever be an alternative to using the integrated inline assembler because some how you would have to get LLVM to allocate registers for it. For global assembly though, an external assembler would be workable.

Regarding breaking changes in the LLVM inline assembler, we are in the same boat as Clang. That is, if they make some changes, we just have to work around them when they happen.

bstrie commented 6 years ago

If we stabilize something defined in terms of what clang supports, then we should call it clang_asm!. The asm! name should be reserved for something that's been designed through a full RFC process, like other major Rust features. #bikeshed

I'm all for it. +1

There's a lot of (nightly) code out there using the current Rust format that would break if we changed to GCC style so it's probably only worth changing if we can come up with something notably better.

@parched Going by @jimblandy 's suggestion quoted above, anyone using asm! will happily be able to still use it.

Today, as far as we know, there's basically no documentation for this feature. This includes LLVM internals and all.

If GCC's assembly syntax is truly not specified or documented after 30 years, then it seems safe to assume that either producing a documented assembly sublanguage is a task that is either so difficult that it is beyond Rust's ability to accomplish given our limited resources, or that people who want to use assembly simply don't care.

We have, as far as we know, no guarantee of stability from LLVM. For all we know the implementation of inline assembly in LLVM could change any day.

It seems unlikely that GCC/Clang's implementation of inline assembly will ever change, since that would break all C code written since the 90s.

Without a specification it's nigh impossible to even imagine an alternate backend for this.

At the risk of being callous, the prospect of alternative backends is moot if Rust as a language does not survive due to its embarrassing inability to drop into assembly. Nightly does not suffice, unless one wants to tacitly endorse the idea that Nightly is Rust, which does more to undermine Rust's stability guarantee than the prospect of LLVM changes.

The other bullets would be quite sad into forgo as it erodes the expected quality of the Rust compiler which is currently quite high.

I'm not lying when I say that every day I am thankful for the attitude of the Rust developers and the enormous standard of quality that they hold themselves to (in fact sometimes I wish y'all would slow down so you can maintain that quality without burning yourselves out like Brian did). However, speaking as someone who was here when luqmana added the asm! macro four years ago, and who has observed no progress since then at getting this stabilized, and who is sad that crypto in Rust is still impossible and that SIMD in Rust doesn't even have a workaround while the cross-platform interface is being slowly determined, I feel despondent. If I seem emphatic here it's because I view this issue as existential to the survival of the project. It may not be a crisis right this moment, but it will take time to stabilize anything at all, and we don't have the years that it will take to design and implement a world-class assembly dialect from scratch (proven by the fact that we have made no progress towards this in the last four years). Rust needs stable inline assembly sometime in 2018. We need prior art to pull that off. The macro_rules! situation acknowledged that sometimes worse is better. Once again, I'm begging someone to prove me wrong.

gnzlbg commented 6 years ago

FWIW and coming late to the party I like what @florob 's the cologne talk proposed. For those that haven't watched it, this is the gist of it:

// Add 5 to variable:
let mut var = 0;
unsafe {
    asm!("add $5, {}", inout(reg) var);
}

// Get L1 cache size
let ebx: i32;
let ecx: i32;
unsafe {
    asm!(r"
        mov $$4, %eax;
        xor %ecx, %ecx;
        cpuid;
        mov %ebx, {};",
        out(reg) ebx, out(ecx) ecx, clobber(eax, ebx, edx)
    );
}
println!("L1 Cache: {}", ((ebx >> 22) + 1)
    * (((ebx >> 12) & 0x3ff) + 1)
    * ((ebx & 0xfff) + 1) * (ecx + 1));

newpavlov commented 6 years ago

How about the following strategy: rename current asm to llvm_asm (plus maybe some minor changes) and state that it's behavior is implementation detail of LLVM, thus Rust stability guarantee does not fully extend to it? Problem of different backends should be more or less solved with target_feature like functionality for conditional compilation depending on the used backend. Yes, such approach will blur Rust stability a bit, but keeping assembly in limbo like this is damaging for Rust in its own way.

Florob commented 6 years ago

I've posted a pre-RFC with an alternative syntax proposal to the internals forum: https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443. Feedback welcome.

BartMassey commented 6 years ago

It looks to me like the best is definitely the enemy of the kind-of-ok here. I fully support sticking a gcc_asm! or clang_asm! or llvm_asm! macro (or any proper subset thereof) into stable with compatible syntax and semantics for now, while a better solution is worked out. I don't see supporting such a thing forever as a huge maintenance burden: the more sophisticated systems proposed above look like they'd pretty easily support just turning the old-style macros into syntactic saccharine for the new one.

I have a binary program http://git@github.com/BartMassey/popcount which requires inline assembly for the x86_64 popcntl instruction. This inline assembly is the only thing keeping this code in nightly. The code was derived from a 12-year-old C program.

Right now, my assembly is conditioned on

    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]

and then gets the cpuid info to see if popcnt is present. It would be nice to have something in Rust similar to the recent Google cpu_features library https://opensource.googleblog.com/2018/02/cpu-features-library.html in Rust, but c'est la vie.

Because this is a demo program as much as anything, I'd like to keep the inline assembly. For real programs, the count_ones() intrinsic would be sufficient — except that getting it to use popcntl requires passing "-C target-cpu=native" to Cargo, probably through RUSTFLAGS (see issue #1137 and several related issues) since distributing a .cargo/config with my source doesn't seem like a great idea, which means that right now I've got a Makefile calling Cargo.

In short, it would be nice if one could use Intel's and others' fancy popcount instruction in real applications, but it seems harder than it needs to be. Intrinsics aren't entirely the answer. Current asm! is an ok answer were it available in stable. It would be great to have a better syntax and semantics for inline assembly, but I don't really need it. It would be great to be able to specify target-cpu=native directly in Cargo.toml, but it wouldn't really solve my problem.

Sorry, rambling. Just thought I'd share why I care about this.

main-- commented 6 years ago

@BartMassey I don‘t understand, why do you so desperately need to compile to popcnt? The only reason I can see is performance and IMO you should definitely just use count_ones() in that case. What you‘re looking for is not inline asm but target_feature (rust-lang/rfcs#2045) so you can tel the compiler that it‘s allowed to emit popcnt.

gnzlbg commented 6 years ago

@BartMassey you don't even need to use inline assembly for this, just use coresimd cfg_feature_enabled!("popcnt") to query whether the cpu your binary runs on supports the popcnt instruction (it will resolve this at compile-time if its possible to do so).

coresimd also provides a popcnt intrinsic that is guaranteed to use the popcnt instruction.

newpavlov commented 6 years ago

@gnzlbg

coresimd also provides a popcnt intrinsic that is guaranteed to use the popcnt instruction.

It's a bit off-topic, but this statement is not strictly true. _popcnt64 uses leading_zeros under the hood, thus if popcnt feature will not be enabled by crate user and crate author will forget to use #![cfg(target_feature = "popcnt")] this intrinsic will get compiled into ineffective assembly and there is no safeguards against it.

gnzlbg commented 6 years ago

thus if popcnt feature will not be enabled by crate user

This is incorrect since the intrinsic uses the #[target_feature(enable = "popcnt")] attribute to enable the popcnt feature for the intrinsic unconditionally, independently of what the crate user enables or disables. Also, the assert_instr(popcnt) attribute ensures that the intrinsic disassembles into popcnt on all x86 platforms that Rust supports.

If one is using Rust on an x86 platform that Rust does not currently support, then its up to whoever is porting core to ensure that these intrinsics generate popcnt on that target.

EDIT: @newpavlov

thus if popcnt feature will not be enabled by crate user and crate author will forget to use #![cfg(target_feature = "popcnt")] this intrinsic will get compiled into ineffective assembly and there is no safeguards against it.

At least in the example you mentioned in the issue, doing this is introduces undefined behavior into the program, and in this case, the compiler is allowed to do anything. Bad codegen that works is one of the multiple outcomes one can get.

BartMassey commented 6 years ago

First of all, apologies for the derailing the discussion. Just wanted to reiterate my main point, which was "I fully support sticking a gcc_asm! or clang_asm! or llvm_asm! macro (or any proper subset thereof) into stable with compatible syntax and semantics for now, while a better solution is worked out. "

The point of the inline assembly is that this is a popcount benchmark / demo. I want a true guaranteed popcntl instruction when possible both as a baseline and to illustrate how to use inline assembly. I also want to guarantee that count_ones() uses a popcount instruction when possible so that Rustc doesn't look terrible compared to GCC and Clang.

Thanks for pointing out target_feature=popcnt. I'll think about how to use it here. I think I want to bench count_ones() regardless of what CPU the user is compiling for and regardless of whether it has a popcount instruction. I just want to make sure that if the target CPU has popcount count_ones() uses it.

The stdsimd/coresimd crates looks nice, and should probably be enabled for these benchmarks. Thanks! For this app I'd prefer to use as little outside the standard language features as possible (I'm already feeling guilty about lazy_static). However, these facilities look too good to ignore, and it looks like they're well on their way to becoming "official".

nagisa commented 6 years ago

There’s an idea floated around by @nbp where there could be some implementation which goes from some representation of code to machine bytes (could be a proc-macro crate or something?) and then those bytes are included directly into the particular location in code.

Splicing arbitrary code bytes into arbitrary places within a function seems like a much easier problem to solve (although ability to specify inputs, outputs and their constraints as ell as clobbers would still be necessary).

cc @eddyb

simias commented 6 years ago

@nagisa it's a little more than just a chuck of machine code though, you also have to be careful about input, output and clobber registers. If the ASM chunk says that it wants a certain variable in %rax and that it will clobber %esi you need to make sure that the surrounding code plays nice. Also if the developer lets the compiler allocate the registers you'll probably want to optimize the allocation to avoid spilling and moving values around.

nbp commented 6 years ago

@simias , indeed you will have to specify how variables are associated to specific registers, and which registers are clobbered, but all of these is smaller than standardizing any assembly language, or any LLVM assembly language.

Standardizing on byte sequences, is probably the easiest way forward by moving the assembly flavor to a driver / proc-macro.

One issue of having verbatim bytes instead of proper inline assembly, is that the compiler would have no option for doing register alpha-renaming, which I do not expect people writing inline assembly are expecting either.

simias commented 6 years ago

But how would that work with register allocation if I want to let the compiler handle that? For instance, using GCC's (atrocious) syntax:

asm ("leal (%1, %1, 4), %0"
     : "=r" (five_times_x)
     : "r" (x));

In something like this I let the compiler allocate the registers, expecting that it will give me whatever is the most convenient and efficient. For instance on x86 64 if five_time_x is the return value then the compiler might allocate eax and if x is a function parameter it might already be available in some register. Of course the compiler only knows exactly how it will allocate registers pretty late in the compilation sequence (especially if it's not as trivial as simply function params and return values).

Would your proposed solution work with something like that?

Florob commented 6 years ago

@nbp I have to say I'm a bit confused by this proposal. First of all, standardizing assembly language was never something we wanted to achieve with inline assembly. At least to me, the premise was always that the assembly language used by the system assembler would be accepted. The problem is not getting the assembly parsed/assembled, we can pass that off to LLVM easily. The problem is with filling templated assembly (or giving LLVM the required information to do so), and specifying inputs, outputs and clobbers. The later problem is not actually solved by your proposal. It is however alleviated, because you wouldn't/couldn't support classes of registers (which @simias asks about), but just concrete registers. At the point where constraints are simplified to that extend, it is actually just as easy to support "real" inline assembly. The first argument is an string containing (non-templated) assembly, the other arguments are the constraints. This is somewhat easily mapped to LLVM's inline assembler expressions. Inserting raw bytes on the other hand is not as far as I know (or can tell from the LLVM IR Reference Manual) supported by LLVM. So we'd basically be extending LLVM IR, and reimplementing a feature (assembling system assembly) that is already present in LLVM using separate crates.

gnzlbg commented 6 years ago

@nbp

indeed you will have to specify how variables are associated to specific registers, and which registers are clobbered, but all of these is smaller than standardizing any assembly language, or any LLVM assembly language.

So how would that be done? I have a sequence of bytes with hardcoded registers with basically means that the input/out registers, clobbers, etc. are all hardcoded inside this sequence of bytes.

Now I inject this bytes somewhere in my rust binary. How do I tell rustc which registers are input/output, which registers got clobbered, etc.? How is this a smaller problem to solve than stabilizing inline assembly ? It looks to me that this is exactly what inline assembly does, just maybe a little harder because now one needs to specify input/outputs clobbers twice, in the assembly written, and in whatever way we pass this information to rustc. Also, rustc would not have an easy time validating this, because for that it would need to be able to parse the sequence of bytes into assembly, and then inspecting that. What am I missing?

nbp commented 6 years ago

@simias

asm ("leal (%1, %1, 4), %0"
     : "=r" (five_times_x)
     : "r" (x));

This would not be possible, as the raw of bytes does not allow alpha renaming of registers, and the registers would have to be enforced by the code sequence ahead.

@Florob

At least to me, the premise was always that the assembly language used by the system assembler would be accepted.

My understanding, is that relying on the system assembler is not something we want to rely on, but more an accepted flaw as part of the asm! macro. Also relying on asm! being the LLVM syntax would be painful for the development of additional backend.

@gnzlbg

So how would that be done? I have a sequence of bytes with hardcoded registers with basically means that the input/out registers, clobbers, etc. are all hardcoded inside this sequence of bytes.

The idea would be to have a list of inputs, outputs, and clobbered registers, where the inputs would be a tuple of the register name associated with a (mutable) reference or copy, the clobbered register would be a list of register names, and the output would be a list of output register would form a tuple of named register to which are associated types.

fn swap(a: u32, b: u32) -> (u32, u32) {
  unsafe{
    asm_raw!{
       bytes: [0x91],
       inputs: [(std::asm::eax, a), (std::asm::ecx, b)],
       clobbered: [],
       outputs: (std::asm::eax, std::asm::ecx),
    }
  }
}

This code sequence might be the output of some compiler procedural macro, which might look like:

fn swap(a: u32, b: u32) -> (u32, u32) {
  unsafe{
    asm_x64!{
       ; <-- (eax, a), (ecx, b)
       xchg eax, ecx
       ; --> (eax, ecx)
    }
  }
}

These sequences, will not be able to directly embedde any symbol or addresses and they would have to be computed and given as registers. I am sure we can figure out how do add the ability to insert some symbol addresses within the byte sequence later on.

The advantage of this approach is that only the list of registers and constraints have to be standardized, and this is something that would easily be supported by any future backend.

jcranmer commented 6 years ago

@nbp

My understanding, is that relying on the system assembler is not something we want to rely on, but more an accepted flaw as part of the asm! macro. Also relying on asm! being the LLVM syntax would be painful for the development of additional backend.

I don't think that's an accurate assessment? With the minor exception of the two different syntaxes for x86 assembly, assembly syntax is largely standard and portable. The only issue with the system assembler might be that it lacks newer instructions, but that's a niche situation not worth optimizing for.

The actual problem is the glue into register allocation. But, as far as the actual assembly string itself is concerned, this merely means someone has to do some string substitution stuff and maybe some parsing--and this kind of substitution should be trivially available for any putative backend.

I agree that LLVM's (or gcc's) syntax for this stuff is crap, but moving to precompiled bytes means that any asm crate now needs to install a full assembler and possibly a full register allocator (or make programmers hand-allocate registers), or attempt to use the system assembler. At that point, it doesn't seem like it's actually really adding much value.

nbp commented 6 years ago

@jcranmer

... but moving to precompiled bytes means that any asm crate now needs to install a full assembler and possibly a full register allocator (or make programmers hand-allocate registers), or attempt to use the system assembler

https://github.com/CensoredUsername/dynasm-rs

This crate uses macro handled by a plugin to assemble the assembly code and generate vectors of raw assembly code to be concatenated at runtime.

simias commented 6 years ago

@nbp maybe my uses cases are peculiar but lack of register renaming and letting the compiler allocate registers for me would be a bit of a deal-breaker because it either means that I need to be very lucky with my choice of registers and happen to "hit right" or the compiler will have to emit non-optimal code to shuffle registers around to match my arbitrary conventions.

If the assembly blob doesn't integrate nicely with the surrounding compiler-emitted assembly I might as well just factor the ASM stub in an external C-style method in a stand-alone .s assembly file since function calls have the same type of register-allocation constraints. This already works today, although I suppose having it built into rustc might simplify the build system compared to having a standalone assembly file. I guess what I'm saying is that IMO your proposal doesn't get us very far compared to the current situation.

And what if the ASM code calls external symbols that would be resolved by the linker? You need to pass that info around since you can't possibly resolve those until late in the compilation process. You'd have to pass there reference alongside your byte array and let the linker resolve them much later.

@jcranmer

With the minor exception of the two different syntaxes for x86 assembly, assembly syntax is largely standard and portable.

I'm not sure I understand what you mean by that, obviously ASM syntax is not portable across architectures. And even within the same architecture there are often variations and options that change the way the language is assembled.

I can give MIPS as an example, there are two important configuration flags that tweak the assembler behaviour: at and reorder. at says whether the assembler is allowed to implicitly use the AT (assembler temporary) register when assembling certain pseudo-instructions. Code that explicitly uses AT to store data must be assembled with at or it'll break.

reorder defines if the coder manually handles branch delay slots or if they trust the assembler to deal with them. Assembling code with the wrong reorder setting will almost certainly generate bogus machine code. When you write MIPS assembly you must be aware of the current mode at all times if it contains any branching instruction. For instance it's impossible to know the meaning of this MIPS listing if you don't know if reorder is enabled:

    addui   $a0, 4
    jal     some_func
    addui   $a1, $s0, 3

32bit ARM assembly has the Thumb/ARM variations, it's important to know which instruction set you're targeting (and you can change on the fly across function calls). Mixing both sets needs to be done very carefully. ARM code also typically loads large immediate values using an implicit PC-relative load, if you pre-assemble your code you'll have to be careful about how you pass these values around since they have to remain close by but are not actual instructions with a well-defined location. I'm talking about pseudo-instructions like:

   ldr   r2, =0x89abcdef

MIPS on the other hand tends to split the immediate value in two 16bit values and use a lui/ori or lui/andi combo. It's usually hidden behind the li/la pseudo-instructions but if you're writing code with noreorder and don't want to waste the delay slot sometimes you have to handle it by hand which results in funny looking code:

.set noreorder

   /* Display a message using printf */
   lui $a0, %hi(hello)
   jal printf
   ori $a0, %lo(hello)

.data

hello:
.string "Hello, world!\n"

The %hi and %lo constructs are a way to tell the assembly to generate a reference to the high and low 16bits of the hello symbol respectively.

Some code needs very peculiar alignment constraints (common when you're dealing with cache invalidation code for instance, you need to be sure that you don't take a saw to the branch you're sitting on). And there's the problem of handling external symbols that can't be resolved at this point in the compilation process as I mentioned earlier.

I'm sure I could come up with peculiarities for a bunch of other architectures I'm less familiar with. For these reasons I'm not sure I'm very optimistic for the macro/DSL approach. I understand that having a random opaque string literal in the middle of the code isn't super elegant but I don't really see what integrating the full ASM syntax into rust one way or an other would give us except additional headaches when adding support for a new architecture.

Writing an assembler is something that may seem trivial at a glance but could turn out to be very tricky if you want to support all the bells, whistles and quirks of all the architectures out there.

On the other hand having a good way to specify bindings and clobbers would be extremely valuable (compared to gcc's... perfectible syntax).

joseincandenza commented 6 years ago

Hi guys,

Sorry for bothering you, I only wanted to drop my two cents, because I'm just an user, and a very shy/quiet one indeed, oh, and a newcomer, I have just recently landed in Rust, but I'm already in love with it.

But this assembly thing is just crazy, I mean, it's a three years span conversation, with a bunch of ideas and complains, but nothing that seems like a minimum consensus. Three years and not a RFC, it seems a little like a death end. I'm developing a humble math library (that hopefully will materialize in two or three crates), and for me (and I suspect that for any other fellow interested in write assembly in rust), the most important thing is to actually being able to do it! with a minimum guarantee that everything is not going to change the next day (that's what the unstable channel, and specially this conversation, makes me feel).

I understand that everyone here wants the best solution, and maybe one day someone comes out with that one, but as for today I believe that the current macro is just fine (well, maybe a little restricting in some ways, but hopefully nothing that cannot be addressed in an incremental way). To write assembly is like the most important thing in a systems language, a very very necessary feature, and although I'm ok relying on cpp_build until this is fixed, I'm very afraid that if it takes a lot more time it will become a forever dependency. I don't know why, call it an irrational idea, but I find that having to call cpp to call assembly is a little sad, I want a pure rust solution.

main-- commented 6 years ago

FWIW Rust is not that special here, MSVC doesn‘t have inline asm for x86_64 either. They do have that really weird implementation where you can use variables as operands but that works for x86 only.

eddyb commented 6 years ago

@josevalaad Could you talk more about what you're using inline assembly for?

We typically only see it used in OS-like situations, which are typically stuck on nightly for other reasons as well, and even then they barely use asm!, so stabilizing asm! hasn't been a high-enough priority to design & develop something that can properly survive outside LLVM and please everyone.

shepmaster commented 6 years ago

Additionally, most things can be done using the exposed platform intrinsics. x86 and x86_64 have been stabilized and other platforms are in progress. It's most people's expectation that these are going to accomplish 95-99% of the goals. You can see my own crate jetscii as an example of using some of the intrinsics.

gnzlbg commented 6 years ago

We just merged a jemalloc PR that uses inline assembly to work around code generation bugs in LLVM - https://github.com/jemalloc/jemalloc/pull/1303 . Somebody used inline assembly in this issue (https://github.com/rust-lang/rust/issues/53232#issue-349262078) to work around a code generation bug in Rust (LLVM) that happened in the jetscii crate. Both happened in the last two weeks, and in both cases the users tried with intrinsics but the compiler failed them.

When code generation for a C compiler happens to be unacceptable, worst case the user can use inline assembly and continue working in C.

When this happens in stable Rust, right now we have to tell people to use a different programming language or wait an indeterminate amount of time (often in the order of years). That's not nice.

rust-lang / rust

Tracking issue for `asm` (inline assembly) #29722