mmcloughlin / avo

Generate x86 Assembly with Go
BSD 3-Clause "New" or "Revised" License
2.73k stars 89 forks source link

Emitting ifdefs and friends #235

Open klauspost opened 2 years ago

klauspost commented 2 years ago

Hi!

Thanks for the great work. To support GOAMD64 I would like to emit:

#ifdef GOAMD64_v3
    TZCNTQ R11, R11
#else
    BSFQ R11, R11
#endif

There are no build tags for GOAMD64, and in this case it would duplicate 15K lines of assembly, if I was to have runtime detection, so an ifdef would be fine in this case.

Unless I am missing something that is currently not possible.

klauspost commented 2 years ago

As a temp workaround I made a temp script that uses comments to contain the ifdefs.

mmcloughlin commented 10 months ago

Thanks for posting this issue, and thank you @pelletier for the PR #406. We should find a way to support GOAMD64 and perhaps preprocessor macros more generally.

I am uneasy about the approach taken in #406 since it allows emitting preprocessor text into an avo program without accounting for the semantics. Specifically, an assembly program with preprocessor conditionals actually represents multiple possible assembly programs, whereas avo would treat it as one.

The following pathological avo program is written using the proposed implementation in #406. The point is to generate n live registers in each of the possible generated code blocks, dependent on the CONDITION macro. If we were to properly account for the semantics of the ifdef preprocessor macros, it would be possible to compile this for n up to 15. But as written avo needs to allocate registers for both code blocks, and register allocation fails when n is increased to 8.

//go:build ignore

package main

import (
    . "github.com/mmcloughlin/avo/build"
    . "github.com/mmcloughlin/avo/operand"
    . "github.com/mmcloughlin/avo/reg"
)

func main() {
    const n = 7

    TEXT("Preprocessor", NOSPLIT, "func() uint64")

    // Allocate n registers with values 1, 2, ..., n.
    Preprocessor("ifdef CONDITION")
    x := make([]GPVirtual, n)
    for i := 0; i < n; i++ {
        x[i] = GP64()
        MOVQ(U32(1+i), x[i])
    }
    Preprocessor("else")
    y := make([]GPVirtual, n)
    for i := 0; i < n; i++ {
        y[i] = GP64()
        MOVQ(U32(1+i), y[i])
    }
    Preprocessor("endif")

    // Sum them up.
    Preprocessor("ifdef CONDITION")
    for i := 1; i < n; i++ {
        ADDQ(x[i], x[0])
    }
    Store(x[0], ReturnIndex(0))
    Preprocessor("else")
    for i := 1; i < n; i++ {
        ADDQ(y[i], y[0])
    }
    Store(y[0], ReturnIndex(0))
    Preprocessor("endif")

    RET()

    Generate()
}

The generated assembly is as follows. Note how avo has allocated distinct sets of registers in the two branches of the #ifdef when in fact there's no conflict.

// Code generated by command: go run asm.go -out preprocessor.s -stubs stub.go. DO NOT EDIT.

#include "textflag.h"

// func Preprocessor() uint64
TEXT ·Preprocessor(SB), NOSPLIT, $0-8
#ifdef CONDITION
    MOVQ $0x00000001, AX
    MOVQ $0x00000002, CX
    MOVQ $0x00000003, DX
    MOVQ $0x00000004, BX
    MOVQ $0x00000005, SI
    MOVQ $0x00000006, DI
    MOVQ $0x00000007, R8
#else
    MOVQ $0x00000001, R9
    MOVQ $0x00000002, R10
    MOVQ $0x00000003, R11
    MOVQ $0x00000004, R12
    MOVQ $0x00000005, R13
    MOVQ $0x00000006, R14
    MOVQ $0x00000007, R15
#endif
#ifdef CONDITION
    ADDQ CX, AX
    ADDQ DX, AX
    ADDQ BX, AX
    ADDQ SI, AX
    ADDQ DI, AX
    ADDQ R8, AX
    MOVQ AX, ret+0(FP)
#else
    ADDQ R10, R9
    ADDQ R11, R9
    ADDQ R12, R9
    ADDQ R13, R9
    ADDQ R14, R9
    ADDQ R15, R9
    MOVQ R9, ret+0(FP)
#endif
    RET

I'm not sure how frequent this kind of problem would be in reality, but I just wanted to demonstrate why I felt uncomfortable about an approach that essentially just emits text into the program. I'm also not quite sure yet what the right approach to this would be.

klauspost commented 10 months ago

@mmcloughlin Thanks for taking a look! For context I mainly use it for GOAMD64_v shortcuts. Two examples:

#ifdef GOAMD64_v3
    TZCNTQ R11, R11
#else
    BSFQ R11, R11
#endif

This is fairly straightforward and I "emit" it like this:

    Comment("#ifdef GOAMD64_v3")
    TZCNTQ(tmp, tmp)
    Comment("#else")
    BSFQ(tmp, tmp)
    Comment("#endif")

... and the script removes // from comments that start with #

My second use case is a bit more complicated... And a lot more hackish...

#ifdef GOAMD64_v4
#define XOR3WAY(ignore, a, b, dst) \
    VPTERNLOGD $0x96, a, b, dst
#else
#define XOR3WAY(ignore, a, b, dst) \
    VPXOR a, dst, dst \
    VPXOR b, dst, dst
#endif

...
    XOR3WAY( $0x00, Y5, Y6, Y2)

So I emit a dummy function with the ifdefs as comments and overload and existing operation with a similar signature...

        data = bytes.ReplaceAll(data, []byte("\t// #"), []byte("#"))
        data = bytes.ReplaceAll(data, []byte("\t// @"), []byte(""))
        data = bytes.ReplaceAll(data, []byte("VPTERNLOGQ"), []byte("XOR3WAY("))

        /// Patch up missing `)`

I could just emit 2 versions of the code with build tags, but I already have ~5MB of assembly output - so I am not really looking to make it any more if it can be avoided. So not having these hacks would be great.

pelletier commented 10 months ago

Thank you for the feedback!

I wonder if we could use Go functions to represent the multiple versions of the program. For example:

func main() {
    const n = 7

    TEXT("Preprocessor", NOSPLIT, "func() uint64")

    Preprocessor("ifdef CONDITION")
    Guard("CONDITION", func(c *avo.Context) { // if
        x := make([]GPVirtual, n)
        for i := 0; i < n; i++ {
            x[i] = c.GP64()
            MOVQ(U32(1+i), x[i])
        }

        for i := 1; i < n; i++ {
            ADDQ(x[i], x[0])
        }
        c.Store(x[0], c.ReturnIndex(0))
    }, func(c *avo.Context) { // else
        y := make([]GPVirtual, n)
        for i := 0; i < n; i++ {
            y[i] = c.GP64()
            MOVQ(U32(1+i), y[i])
        }

        for i := 1; i < n; i++ {
            ADDQ(y[i], y[0])
        }
        c.Store(y[0], c.ReturnIndex(0))
    })

    // global context is left untouched.

    RET()

    Generate()
}

But that feels error-prone to mistakenly use the global context instead of the local context in those functions.

Or maybe the solution needs to be more targeted if only register allocations is the issue? something like:

func main() {
    const n = 7

    TEXT("Preprocessor", NOSPLIT, "func() uint64")

    // Allocate n registers with values 1, 2, ..., n.
    Preprocessor("ifdef CONDITION")

    // BranchAllocator would be a new function that creates a new register
    // allocator that starts with the same state as the global allocator,
    // but effectively "branches out".

    x := make([]GPVirtual, n)
    allocatorX := BranchAllocator()
    for i := 0; i < n; i++ {
        x[i] = allocatorX.GP64()
        MOVQ(U32(1+i), x[i])
    }
    Preprocessor("else")
    y := make([]GPVirtual, n)
    allocatorY := BranchAllocator()
    for i := 0; i < n; i++ {
        y[i] = allocatorY.GP64()
        MOVQ(U32(1+i), y[i])
    }
    Preprocessor("endif")

    // Sum them up.
    Preprocessor("ifdef CONDITION")
    for i := 1; i < n; i++ {
        ADDQ(x[i], x[0])
    }
    Store(x[0], ReturnIndex(0))
    Preprocessor("else")
    for i := 1; i < n; i++ {
        ADDQ(y[i], y[0])
    }
    Store(y[0], ReturnIndex(0))
    Preprocessor("endif")

    // At any point later, if you need to allocate you have two options:

    // Option 1: pick the allocator that represents the version of the
    // program you care about:
    reg := allocatorX.GP64()

    // Option 2: forget about those allocations and use the global
    // allocator.
    reg := GP64()

    RET()

    Generate()
}

In any case, it will probably take quite a bit to get to a satisfying solution to this problem. In the meantime, how would you feel about adding an escape-hatch function that allows emitting arbitrary text? Something like RawText("#ifndef hasAVX2"). It would come with no gloves and maybe would need a scarier name, or put in a different package like avo/danger so that people know it should be used with care. It would unlock both @klauspost and I, while we figure out a higher-level construct for pre-processor functions?

mmcloughlin commented 10 months ago

Thanks both!

@pelletier I have not fully thought through the implications of having an avo program represent multiple actual assembly programs, but right now I think register allocation will be the main issue. I think I might be content to defer fixing those issues right now since I expect it would almost never come up in practice. But in order to make it possible to handle register allocation correctly in future, we need a solution where avo can recover the semantics of the preprocessor statements. For that reason a text-based Preprocessor function is a bit ugly, since avo would have to parse the text. I agree that your Guard proposal is error-prone.

At this point I'm leaning towards just adding functions Ifdef, Ifndef, Else and Endif. I think this has the benefit of at least somewhat looking like Go assembly (unless we can think of some nice characters to mimic the #).

Ifdef("CONDITION")
// ...
Else()
// ...
Endif()

Some alternatives are things like hash.Ifdef, pound.Ifdef, preproc.Ifdef, or even following the Go assembler and making use of obscure unicode #Ifdef.

With a design like this it will also be fairly easy for avo to determine the multiple possible output programs based on the CONDITION macro. So if I'm motivated enough I can account for it in the register allocator.

@klauspost Your first example seems like the simple case, but I'm not sure about the second. I'm hesitant to add support for #define in avo though you could imagine it as a method of defining "custom instructions" in an avo program. In this case the simplest thing to do I think would be a XOR3WAY function in your avo program that emits the Ifdef("GOAMD64_v4"). That would have the downside that your output assembly would contain the fully expanded 6 line #ifdef ... #else ... #endif block instead of the concise XOR3WAY one line instruction. Would that be acceptable?

pelletier commented 10 months ago

Sounds good to me. I'll submit a new PR that adds Ifdef and friends. I'm in favor of a package named pp, preproc, or preprocessor to avoid typing non-a-zA-Z0-9 package or function names.

mmcloughlin commented 10 months ago

My only other question about #406 is whether we should accept Include anywhere?

I'm wondering because #include in the middle of Go assembly is perfectly valid. Now in an avo program actually doing that would put you in an "all bets are off" situation, so would almost always be unadvisable. Since we strive to have avo code look somewhat like assembly, I wonder about allowing Include anywhere but with the behavior of "include in header" rather than "include here".

Possible alternatives:

Thoughts?

pelletier commented 10 months ago

I think IncludeHeader makes the most sense to me. It does what it says in the name, as opposed to get an error at first run. If people end up needing #include outside of header, this could become a candidate for preproc.Include for example.