tinygo-org / tinygo

Go compiler for small places. Microcontrollers, WebAssembly (WASM/WASI), and command-line tools. Based on LLVM.
https://tinygo.org
Other
14.72k stars 858 forks source link

remove unused registers for x86_64 linux syscalls #4301

Closed leongross closed 1 week ago

leongross commented 1 week ago

A short analysis of my findings:

Except (obviously) for the addresses, the generated assembly code is the same.

1,10c1,10
<         002119ac 6a 2c           PUSH       0x2c
<         002119ae 59              POP        RCX
<         002119af 4c 89 f7        MOV        RDI,R14
<         002119b2 4c 89 e6        MOV        RSI,R12
<         002119b5 4c 89 fa        MOV        RDX,R15
<         002119b8 45 31 d2        XOR        R10D,R10D
<         002119bb 49 89 c0        MOV        R8,RAX
<         002119be 4d 89 e9        MOV        R9,R13
<         002119c1 48 89 c8        MOV        RAX,RCX
<         002119c4 0f 05           SYSCALL
---
>         00211934 6a 2c           PUSH       0x2c
>         00211936 59              POP        RCX
>         00211937 4c 89 f7        MOV        RDI,R14
>         0021193a 4c 89 e6        MOV        RSI,R12
>         0021193d 4c 89 fa        MOV        RDX,R15
>         00211940 45 31 d2        XOR        R10D,R10D
>         00211943 49 89 c0        MOV        R8,RAX
>         00211946 4d 89 e9        MOV        R9,R13
>         00211949 48 89 c8        MOV        RAX,RCX
>         0021194c 0f 05           SYSCALL

Two key points might be important:

leongross commented 1 week ago

@aykevl @deadprogram

leongross commented 1 week ago

ci failure looks like windows race condition again ...

aykevl commented 1 week ago

As discussed on Slack, this change looks correct.

ci failure looks like windows race condition again ...

I've restarted the build. Hopefully it'll pass now (also, this PR doesn't affect Windows so it'd be safe to merge even with the failure IMHO).

Except (obviously) for the addresses

That doesn't seem obvious to me? If the addresses are different, it means there is a difference in some other part of the binary. This could very well be because with this change, the compiler is allowed to keep a few more registers alive across the syscall, which probably means more efficient code. So entirely expected.

Also, to be pedantic, the fact that the code in one example is the same doesn't necessarily mean the change is correct: I've seen weird edge cases where code would work correctly in most cases but have a few cases where wrong code was generated due to a subtle bug in the inline assembly.

leongross commented 1 week ago

Also, to be pedantic, the fact that the code in one example is the same doesn't necessarily mean the change is correct: I've seen weird edge cases where code would work correctly in most cases but have a few cases where wrong code was generated due to a subtle bug in the inline assembly.

You are totally right with this. However, I am not quite sure how to systematically test the code generation behavior in an efficient way. If you have any suggestions I'd be glad to help you out with that, but atm I don't see any.

aykevl commented 1 week ago

I tried to compile your example program with and without this change on linux/amd64 and the produced binary is exactly identical. Are you sure about the diff you were seeing?

$ md5sum test*.elf
d1017e8996b9e4d08611cf06bc42933c  test1.elf
d1017e8996b9e4d08611cf06bc42933c  test2.elf

(I used llvm-objdump to look at the binary, but since the files are identical there won't be a difference).

My build command looks like this (because I'm on linux/arm64):

GOARCH=amd64 tinygo build -o test1.elf ./tmp/pr4301.go

In any case, I think this is a safe change.

deadprogram commented 1 week ago

Thank you for the PR @leongross and to @aykevl for review.