mmcloughlin / avo

Generate x86 Assembly with Go
BSD 3-Clause "New" or "Revised" License
2.74k stars 90 forks source link

proposal: generate cpu feature-checks and runtime dispatch helpers #168

Open mmcloughlin opened 3 years ago

mmcloughlin commented 3 years ago

@vsivsi suggested that avo could generate helpers for selecting function implementations based on runtime CPU feature checks (see https://github.com/mmcloughlin/avo/issues/20#issuecomment-767259226).

This seems like a great idea but I think there are some questions about the details.

At a minimum, avo could generate boolean variables for each function indicating whether they are supported. This would be fairly easy: avo already generates a comment for each function showing which ISAs it needs, and this would be enough to generate a boolean based on the constants in x/sys/cpu.

Generating runtime dispatch or function selection code might take a bit more thought, but also sounds doable.

Creating this issue for further discussion.

mmcloughlin commented 3 years ago

171 added some manual CPU feature checks for examples. This is a reminder to replace that code if we implement an auto-generated solution for feature checks.

mmcloughlin commented 3 years ago

ICC/Clang support this:

https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/processor-targeting.html https://clang.llvm.org/docs/AttributeReference.html#cpu-dispatch https://reviews.llvm.org/D47474

vsivsi commented 3 years ago

As a simpler half-measure that also has the advantage of being able to drive test file generation, etc. I wonder about just specifying a new exported function in build/global.go like:

var irFile *ir.File

type FuncInfo struct {
    Name      string
    Signature string
    ISA       []string
}

// Return the functions generated
func GeneratedFuncInfo() (retVal []FuncInfo) {
    if irFile == nil {
        return
    }
    for _, f := range irFile.Functions() {
        retVal = append(retVal, FuncInfo{f.Name, f.Signature.String(), f.ISA})
    }
    return
}

Questions:

What is the preferred way to set irFile?

  1. call ctx.Result() directly within GeneratedFuncInfo() and hope for the best.
  2. set it globally in Main() when it is called by Generate()
  3. modify Main() to return it to Generate() and store it there.
  4. define a new generator function GenerateWithFuncInfo() that does everything itself.

1) could return weird results depending on when GeneratedFuncInfo is called. 2) adds an inter-file global side effect. 3) requires a change to the exported API for Main. 4) will unavoidably lead to some code redundancy, although Generate() could use GenerateWithFuncInfo() and just discard the return value.

Thoughts?

vsivsi commented 3 years ago

I quickly prototyped this using approach 2) above, and it works exactly as I need, e.g.:

    Generate()

    for _, f := range GeneratedFuncInfo() {
        fmt.Printf("Func: %s \t Sig: %s \t Reqs: %v\n", f.Name, f.Signature, f.ISA)
    }

Prints:

Func: varLenWriteAVX512_4        Sig: (in [][4]uint64, out *[4][]uint64, thresh uint64) byte     Reqs: [AVX2 AVX512DQ AVX512F AVX512VL]
Func: varLenWriteAVX2_4          Sig: (in [][4]uint64, out *[4][]uint64, thresh uint64) byte     Reqs: [AVX AVX2 SSE2]
Func: varLenWriteAVX512_8        Sig: (in [][8]uint64, out *[8][]uint64, thresh uint64) byte     Reqs: [AVX512DQ AVX512F]
Func: varLenWriteAVX2_8          Sig: (in [][8]uint64, out *[8][]uint64, thresh uint64) byte     Reqs: [AVX AVX2 SSE2]
Func: varLenWriteAVX512_16       Sig: (in [][16]uint64, out *[16][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
Func: varLenWriteAVX2_16         Sig: (in [][16]uint64, out *[16][]uint64, thresh uint64) byte   Reqs: [AVX AVX2 SSE2]
Func: varLenWriteAVX512_24       Sig: (in [][24]uint64, out *[24][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
Func: varLenWriteAVX512_32       Sig: (in [][32]uint64, out *[32][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
Func: varLenWriteAVX512_48       Sig: (in [][48]uint64, out *[48][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
Func: varLenWriteAVX512_64       Sig: (in [][64]uint64, out *[64][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
vsivsi commented 3 years ago

BTW, a much simpler way to enable all of this would be to just export the global context variable and let users go to town, buyer beware style. I suppose that's possible today by eschewing everything in build/global.go and setting everything up manually, but that's a pretty big inconvenience just to get at some basic info about the generated functions.

vsivsi commented 3 years ago

After sleeping on this, I've spent a bit of time this morning trying out another approach to this issue.

What I'm really trying to do here is gain hooks into (currently) internal Avo state that is needed to generate certain kinds of support code (runtime codepath selection, test coverage, automated benchmarking, etc.). My recent comments above articulate two possible approaches:

  1. Throw open the Avo state by exporting the global context. Pro: the ultimate in flexibility. Con: risky and hard to support.
  2. Enumerate the useful bits of internal state for this task and add API support for exporting just that. Pro: simple and safe. Con: not very flexible or extensible.

The third way I've just prototyped is to add an API call to register new file "Printers" using the existing internal hooks.

In global.go:

// AddPrinter registers a custom printer
func AddPrinter(flag, desc string, pB printer.Builder, dflt io.WriteCloser) {
    pV := newPrinterValue(pB, dflt)
    flagSet.Var(pV, flag, desc)
    flags.printers = append(flags.printers, pV)
}

Then in my code I can write, e.g.

type myGenerator struct {
    cfg printer.Config
    printer.Generator
}

// NewMyGenerator constructs a printer for writing a function comments file.
func NewMyGenerator(cfg printer.Config) printer.Printer {
    return &myGenerator{cfg: cfg}
}

func (gen *myGenerator) Print(f *ir.File) ([]byte, error) {
    gen.Comment(gen.cfg.GeneratedWarning())

    gen.NL()
    gen.Printf("package %s\n", gen.cfg.Pkg)

    for _, val := range f.Functions() {
        gen.Comment(fmt.Sprintf("Func: %s \t Sig: %s \t Reqs: %v\n", val.Name, val.Signature, val.ISA))
    }

    return gen.Result()
}

And in main():

    AddPrinter("myfile", "produce file enumerating generated functions in comments", NewMyGenerator, nil)
    Generate()

Which when run with flag -myfile woot.go produces:

woot.go

// Code generated by command: go run generate_var_len_write.go -out var_len_write_amd64.s -stubs var_len_write_amd64.go -pkg prototype -myfile woot.go. DO NOT EDIT.

package prototype 
// Func: varLenWriteAVX512_4     Sig: (in [][4]uint64, out *[4][]uint64, thresh uint64) byte     Reqs: [AVX2 AVX512DQ AVX512F AVX512VL]
// Func: varLenWriteAVX2_4   Sig: (in [][4]uint64, out *[4][]uint64, thresh uint64) byte     Reqs: [AVX AVX2 SSE2]
// Func: varLenWriteAVX512_8     Sig: (in [][8]uint64, out *[8][]uint64, thresh uint64) byte     Reqs: [AVX512DQ AVX512F]
// Func: varLenWriteAVX2_8   Sig: (in [][8]uint64, out *[8][]uint64, thresh uint64) byte     Reqs: [AVX AVX2 SSE2]
// Func: varLenWriteAVX512_16    Sig: (in [][16]uint64, out *[16][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
// Func: varLenWriteAVX2_16      Sig: (in [][16]uint64, out *[16][]uint64, thresh uint64) byte   Reqs: [AVX AVX2 SSE2]
// Func: varLenWriteAVX512_24    Sig: (in [][24]uint64, out *[24][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
// Func: varLenWriteAVX512_32    Sig: (in [][32]uint64, out *[32][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
// Func: varLenWriteAVX512_48    Sig: (in [][48]uint64, out *[48][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]
// Func: varLenWriteAVX512_64    Sig: (in [][64]uint64, out *[64][]uint64, thresh uint64) byte   Reqs: [AVX512DQ AVX512F]

That was way simpler than I thought it would be, and has the benefit of maybe preventing a lot of wheel reinvention around code generation.

The principle drawbacks are:

My takeaway is that I think this AddPrinter approach is the most congruent. But it very much needs to be positioned in the API as an internal "plugin" interface with different compatibility guarantees around user supplied printers and use of the formerly internal Generator API.

Thoughts?