Open klauspost opened 2 years ago
As a temp workaround I made a temp script that uses comments to contain the ifdefs.
Thanks for posting this issue, and thank you @pelletier for the PR #406. We should find a way to support GOAMD64
and perhaps preprocessor macros more generally.
I am uneasy about the approach taken in #406 since it allows emitting preprocessor text into an avo program without accounting for the semantics. Specifically, an assembly program with preprocessor conditionals actually represents multiple possible assembly programs, whereas avo
would treat it as one.
The following pathological avo
program is written using the proposed implementation in #406. The point is to generate n
live registers in each of the possible generated code blocks, dependent on the CONDITION
macro. If we were to properly account for the semantics of the ifdef
preprocessor macros, it would be possible to compile this for n
up to 15. But as written avo
needs to allocate registers for both code blocks, and register allocation fails when n
is increased to 8.
//go:build ignore
package main
import (
. "github.com/mmcloughlin/avo/build"
. "github.com/mmcloughlin/avo/operand"
. "github.com/mmcloughlin/avo/reg"
)
func main() {
const n = 7
TEXT("Preprocessor", NOSPLIT, "func() uint64")
// Allocate n registers with values 1, 2, ..., n.
Preprocessor("ifdef CONDITION")
x := make([]GPVirtual, n)
for i := 0; i < n; i++ {
x[i] = GP64()
MOVQ(U32(1+i), x[i])
}
Preprocessor("else")
y := make([]GPVirtual, n)
for i := 0; i < n; i++ {
y[i] = GP64()
MOVQ(U32(1+i), y[i])
}
Preprocessor("endif")
// Sum them up.
Preprocessor("ifdef CONDITION")
for i := 1; i < n; i++ {
ADDQ(x[i], x[0])
}
Store(x[0], ReturnIndex(0))
Preprocessor("else")
for i := 1; i < n; i++ {
ADDQ(y[i], y[0])
}
Store(y[0], ReturnIndex(0))
Preprocessor("endif")
RET()
Generate()
}
The generated assembly is as follows. Note how avo
has allocated distinct sets of registers in the two branches of the #ifdef
when in fact there's no conflict.
// Code generated by command: go run asm.go -out preprocessor.s -stubs stub.go. DO NOT EDIT.
#include "textflag.h"
// func Preprocessor() uint64
TEXT ·Preprocessor(SB), NOSPLIT, $0-8
#ifdef CONDITION
MOVQ $0x00000001, AX
MOVQ $0x00000002, CX
MOVQ $0x00000003, DX
MOVQ $0x00000004, BX
MOVQ $0x00000005, SI
MOVQ $0x00000006, DI
MOVQ $0x00000007, R8
#else
MOVQ $0x00000001, R9
MOVQ $0x00000002, R10
MOVQ $0x00000003, R11
MOVQ $0x00000004, R12
MOVQ $0x00000005, R13
MOVQ $0x00000006, R14
MOVQ $0x00000007, R15
#endif
#ifdef CONDITION
ADDQ CX, AX
ADDQ DX, AX
ADDQ BX, AX
ADDQ SI, AX
ADDQ DI, AX
ADDQ R8, AX
MOVQ AX, ret+0(FP)
#else
ADDQ R10, R9
ADDQ R11, R9
ADDQ R12, R9
ADDQ R13, R9
ADDQ R14, R9
ADDQ R15, R9
MOVQ R9, ret+0(FP)
#endif
RET
I'm not sure how frequent this kind of problem would be in reality, but I just wanted to demonstrate why I felt uncomfortable about an approach that essentially just emits text into the program. I'm also not quite sure yet what the right approach to this would be.
@mmcloughlin Thanks for taking a look! For context I mainly use it for GOAMD64_v
shortcuts. Two examples:
#ifdef GOAMD64_v3
TZCNTQ R11, R11
#else
BSFQ R11, R11
#endif
This is fairly straightforward and I "emit" it like this:
Comment("#ifdef GOAMD64_v3")
TZCNTQ(tmp, tmp)
Comment("#else")
BSFQ(tmp, tmp)
Comment("#endif")
... and the script removes //
from comments that start with #
My second use case is a bit more complicated... And a lot more hackish...
#ifdef GOAMD64_v4
#define XOR3WAY(ignore, a, b, dst) \
VPTERNLOGD $0x96, a, b, dst
#else
#define XOR3WAY(ignore, a, b, dst) \
VPXOR a, dst, dst \
VPXOR b, dst, dst
#endif
...
XOR3WAY( $0x00, Y5, Y6, Y2)
So I emit a dummy function with the ifdefs as comments and overload and existing operation with a similar signature...
data = bytes.ReplaceAll(data, []byte("\t// #"), []byte("#"))
data = bytes.ReplaceAll(data, []byte("\t// @"), []byte(""))
data = bytes.ReplaceAll(data, []byte("VPTERNLOGQ"), []byte("XOR3WAY("))
/// Patch up missing `)`
I could just emit 2 versions of the code with build tags, but I already have ~5MB of assembly output - so I am not really looking to make it any more if it can be avoided. So not having these hacks would be great.
Thank you for the feedback!
I wonder if we could use Go functions to represent the multiple versions of the program. For example:
func main() {
const n = 7
TEXT("Preprocessor", NOSPLIT, "func() uint64")
Preprocessor("ifdef CONDITION")
Guard("CONDITION", func(c *avo.Context) { // if
x := make([]GPVirtual, n)
for i := 0; i < n; i++ {
x[i] = c.GP64()
MOVQ(U32(1+i), x[i])
}
for i := 1; i < n; i++ {
ADDQ(x[i], x[0])
}
c.Store(x[0], c.ReturnIndex(0))
}, func(c *avo.Context) { // else
y := make([]GPVirtual, n)
for i := 0; i < n; i++ {
y[i] = c.GP64()
MOVQ(U32(1+i), y[i])
}
for i := 1; i < n; i++ {
ADDQ(y[i], y[0])
}
c.Store(y[0], c.ReturnIndex(0))
})
// global context is left untouched.
RET()
Generate()
}
But that feels error-prone to mistakenly use the global context instead of the local context in those functions.
Or maybe the solution needs to be more targeted if only register allocations is the issue? something like:
func main() {
const n = 7
TEXT("Preprocessor", NOSPLIT, "func() uint64")
// Allocate n registers with values 1, 2, ..., n.
Preprocessor("ifdef CONDITION")
// BranchAllocator would be a new function that creates a new register
// allocator that starts with the same state as the global allocator,
// but effectively "branches out".
x := make([]GPVirtual, n)
allocatorX := BranchAllocator()
for i := 0; i < n; i++ {
x[i] = allocatorX.GP64()
MOVQ(U32(1+i), x[i])
}
Preprocessor("else")
y := make([]GPVirtual, n)
allocatorY := BranchAllocator()
for i := 0; i < n; i++ {
y[i] = allocatorY.GP64()
MOVQ(U32(1+i), y[i])
}
Preprocessor("endif")
// Sum them up.
Preprocessor("ifdef CONDITION")
for i := 1; i < n; i++ {
ADDQ(x[i], x[0])
}
Store(x[0], ReturnIndex(0))
Preprocessor("else")
for i := 1; i < n; i++ {
ADDQ(y[i], y[0])
}
Store(y[0], ReturnIndex(0))
Preprocessor("endif")
// At any point later, if you need to allocate you have two options:
// Option 1: pick the allocator that represents the version of the
// program you care about:
reg := allocatorX.GP64()
// Option 2: forget about those allocations and use the global
// allocator.
reg := GP64()
RET()
Generate()
}
In any case, it will probably take quite a bit to get to a satisfying solution to this problem. In the meantime, how would you feel about adding an escape-hatch function that allows emitting arbitrary text? Something like RawText("#ifndef hasAVX2")
. It would come with no gloves and maybe would need a scarier name, or put in a different package like avo/danger
so that people know it should be used with care. It would unlock both @klauspost and I, while we figure out a higher-level construct for pre-processor functions?
Thanks both!
@pelletier I have not fully thought through the implications of having an avo
program represent multiple actual assembly programs, but right now I think register allocation will be the main issue. I think I might be content to defer fixing those issues right now since I expect it would almost never come up in practice. But in order to make it possible to handle register allocation correctly in future, we need a solution where avo
can recover the semantics of the preprocessor statements. For that reason a text-based Preprocessor
function is a bit ugly, since avo
would have to parse the text. I agree that your Guard
proposal is error-prone.
At this point I'm leaning towards just adding functions Ifdef
, Ifndef
, Else
and Endif
. I think this has the benefit of at least somewhat looking like Go assembly (unless we can think of some nice characters to mimic the #
).
Ifdef("CONDITION")
// ...
Else()
// ...
Endif()
Some alternatives are things like hash.Ifdef
, pound.Ifdef
, preproc.Ifdef
, or even following the Go assembler and making use of obscure unicode #Ifdef
.
With a design like this it will also be fairly easy for avo
to determine the multiple possible output programs based on the CONDITION
macro. So if I'm motivated enough I can account for it in the register allocator.
@klauspost Your first example seems like the simple case, but I'm not sure about the second. I'm hesitant to add support for #define
in avo
though you could imagine it as a method of defining "custom instructions" in an avo
program. In this case the simplest thing to do I think would be a XOR3WAY
function in your avo
program that emits the Ifdef("GOAMD64_v4")
. That would have the downside that your output assembly would contain the fully expanded 6 line #ifdef ... #else ... #endif
block instead of the concise XOR3WAY
one line instruction. Would that be acceptable?
Sounds good to me. I'll submit a new PR that adds Ifdef
and friends. I'm in favor of a package named pp
, preproc
, or preprocessor
to avoid typing non-a-zA-Z0-9
package or function names.
My only other question about #406 is whether we should accept Include
anywhere?
I'm wondering because #include
in the middle of Go assembly is perfectly valid. Now in an avo
program actually doing that would put you in an "all bets are off" situation, so would almost always be unadvisable. Since we strive to have avo
code look somewhat like assembly, I wonder about allowing Include
anywhere but with the behavior of "include in header" rather than "include here".
Possible alternatives:
Include
outside the header.Require
, IncludeHeader
, IncludeInHeader
, ...Thoughts?
I think IncludeHeader
makes the most sense to me. It does what it says in the name, as opposed to get an error at first run. If people end up needing #include
outside of header, this could become a candidate for preproc.Include
for example.
Hi!
Thanks for the great work. To support GOAMD64 I would like to emit:
There are no build tags for GOAMD64, and in this case it would duplicate 15K lines of assembly, if I was to have runtime detection, so an ifdef would be fine in this case.
Unless I am missing something that is currently not possible.