eliasnaur commented 2 years ago

The gioui.org/shader/piet package uses .syso files to embed pre-built object files into Go programs. It seems TinyGo ignores them:

tinygo flash -target stm32f469disco ./stm32f4
ld.lld: error: undefined hidden symbol: elements_coroutine_begin
>>> referenced by elements_abi.c
>>>               /Users/e/Library/Caches/tinygo/obj-920bd271d98c454017dee52698a38d15be7e3a20326ee84e434c6d2a.o:(elements_program_info)

ld.lld: error: undefined hidden symbol: elements_coroutine_await
>>> referenced by elements_abi.c
>>>               /Users/e/Library/Caches/tinygo/obj-920bd271d98c454017dee52698a38d15be7e3a20326ee84e434c6d2a.o:(elements_program_info)

ld.lld: error: undefined hidden symbol: elements_coroutine_destroy
>>> referenced by elements_abi.c
>>>               /Users/e/Library/Caches/tinygo/obj-920bd271d98c454017dee52698a38d15be7e3a20326ee84e434c6d2a.o:(elements_program_info)
error: failed to link /var/folders/3p/_lbqc_4n5t764s1c44sql13w0000gn/T/tinygo2157102769/main: exit status 1

aykevl commented 2 years ago

After a quick Google search, I didn't find much information about .syso files. They appear to be basically just object files. Is that correct? Do you have any further information on them? (If they are just object files, it should be relatively easy to add support for them in TinyGo by simply adding them to the linker command line).

aykevl commented 2 years ago

Oh but you're trying to use them on a Cortex-M4? The Cortex-M4 uses the Thumb2 instruction set, which is different from the ARM instruction set. You won't be able to use elements_linux_arm.syso on the stm32f469, for example.

If these are files compiled from C, it's probably a better idea to just compile those via CGo.

dgryski commented 2 years ago

@aykevl I believe they are just object files, and conveniently available via go list:

        SysoFiles       []string   // .syso object files to add to archive

eliasnaur commented 2 years ago

Yes, they're just object files with a distinct filename extension. Best resource I found was https://zchee.github.io/golang-wiki/GcToolchainTricks/.

The Thumb2 issue is certainly a problem, but since I'm using LLVM to generate the files, I assume it's not a dealbreaker.

aykevl commented 2 years ago

Can you please give me a bit more detail how this all works? I know very little about graphics.

It appears that the source files are the *.comp files, which look like GLSL.
These files seem to be compiled to object files using a forked version of SwiftShader.

...but I'm not sure about any of this and think I'm missing the big picture.

The Thumb2 issue is certainly a problem, but since I'm using LLVM to generate the files, I assume it's not a dealbreaker.

I know nothing about SwiftShader, but it may be as simple as changing the target triple to thumbv7em-unknown-unknown-eabi. Thumb2 is very much like ARM (even the same LLVM backend), but with a different instruction encoding and some instructions added/removed.

For the absolutely best performance, it would be interesting to compile them to bitcode (or C) files and link them together via LTO/ThinLTO (#2638). That might even allow for inlining or other inter-procedural optimizations to happen across Go and GLSL. ...but this probably requires quite a bit of work. From the TinyGo POV, if these .comp files could somehow be compiled to (semi-portable) C that would be perfect because TinyGo could manage the compilation process and compile for the exact architecture that's used.

eliasnaur commented 2 years ago

Can you please give me a bit more detail how this all works? I know very little about graphics.
* It appears that the source files are the `*.comp` files, which look like GLSL.

* These files seem to be compiled to object files using a forked version of SwiftShader.
...but I'm not sure about any of this and think I'm missing the big picture.

Your description is exactly right. The source files are GLSL compute shaders, whose execution model is very similar to a many-core CPU. As such, they're relatively easy to execute on CPUs.

The Thumb2 issue is certainly a problem, but since I'm using LLVM to generate the files, I assume it's not a dealbreaker.

I know nothing about SwiftShader, but it may be as simple as changing the target triple to thumbv7em-unknown-unknown-eabi. Thumb2 is very much like ARM (even the same LLVM backend), but with a different instruction encoding and some instructions added/removed.

Thanks. I do hope a target change is enough.

For the absolutely best performance, it would be interesting to compile them to bitcode (or C) files and link them together via LTO/ThinLTO (#2638). That might even allow for inlining or other inter-procedural optimizations to happen across Go and GLSL. ...but this probably requires quite a bit of work. From the TinyGo POV, if these .comp files could somehow be compiled to (semi-portable) C that would be perfect because TinyGo could manage the compilation process and compile for the exact architecture that's used.

Compiling to bitcode sounds very interesting. Can LLVM output bitcode with a simple target change, resulting in an object file ready to include? I'm not sure about semi-portable C; is that also supported as a target from LLVM?

Is there an easy way to hack TinyGo to include .syso files before spending too much time thinking about doing it properly? I have two worries that may result in me having to write a custom renderer for TinyGo, making the need for .syso files moot.

One of the compelling reasons to run compute programs on CPUs is that the compute execution model lends itself to using SIMD instructions to multiply the number of logical cores for each physical core. But I worry that a Cortex-M4 is simply never going to run SwiftShader generated code.

Another worry is that even if we get them to run, the compute programs are simply too inefficient running on a weak single core, even with FPU enabled.

eliasnaur commented 2 years ago

Is there an easy way to hack TinyGo to include .syso files before spending too much time thinking about doing it properly? I have two worries that may result in me having to write a custom renderer for TinyGo, making the need for .syso files moot.

One of the compelling reasons to run compute programs on CPUs is that the compute execution model lends itself to using SIMD instructions to multiply the number of logical cores for each physical core. But I worry that a Cortex-M4 is simply never going to run SwiftShader generated code.

It turns out changing the target triple to thumbv7em-unknown-unknown-eabi is enough to target thumb2, leaving intrinsics the next missing piece:

JIT session error: Symbols not found: [ aeabi_f2uiz, __aeabi_fadd, aeabi_fcmpeq, aeabi_fcmpge, __aeabi_fcmpgt, aeabi_fcmplt, aeabi_fdiv, __aeabi_fmul, aeabi_fsub, aeabi_i2f, sqrtf ] JIT session error: Symbols not found: [ __aeabi_fmul ] JIT session error: Symbols not found: [ aeabi_f2iz, aeabi_fmul ] JIT session error: Symbols not found: [ __aeabi_f2iz, aeabi_f2uiz, aeabi_fadd, __aeabi_fcmpeq, aeabi_fcmpgt, aeabi_fcmplt, __aeabi_fdiv, aeabi_fmul, aeabi_fsub, __aeabi_i2f, sqrtf ] JIT session error: Symbols not found: [ aeabi_f2iz, aeabi_fmul ] JIT session error: Symbols not found: [ __aeabi_fadd, aeabi_fcmpge, aeabi_fcmpgt, __aeabi_fcmple, aeabi_fcmplt, __aeabi_fmul, __aeabi_fsub, sqrtf ]

I assume implementing those intrinsics will make the build succeed, and the result will likely be runnable on Cortex-M4. However, my hope for good performance is fading, so for now I'm going to investigate a custom renderer based on golang.org/x/image/vector. It has a fixed-point implementation and may run OK even without a FPU.

aykevl commented 2 years ago

Compiling to bitcode sounds very interesting. Can LLVM output bitcode with a simple target change, resulting in an object file ready to include? I'm not sure about semi-portable C; is that also supported as a target from LLVM?

It's not that easy. It probably requires a code change to SwiftShader to write bitcode instead of generating object code. Probably better to shelve as a possible future enhancement. Emitting C is probably even harder. There used to be a C backend to LLVM, but it was removed unfortunately.

One of the compelling reasons to run compute programs on CPUs is that the compute execution model lends itself to using SIMD instructions to multiply the number of logical cores for each physical core. But I worry that a Cortex-M4 is simply never going to run SwiftShader generated code.

I think it will be possible. The Cortex-M4 does have some SIMD like instructions, but I believe they're mostly for integer math (2x16 and 4x8 bit - all in 32-bit registers). So it may be rather slow. These instructions could perhaps be useful though to speed up graphics rendering.

Another worry is that even if we get them to run, the compute programs are simply too inefficient running on a weak single core, even with FPU enabled.

The Cortex-M4 is a relatively weak processor. It has an FPU, but don't expect it to be very fast. Here is an overview how long each instruction takes: https://developer.arm.com/documentation/ddi0439/b/BEHJADED But of course, you only really know this for sure when testing in practice.

It turns out changing the target triple to thumbv7em-unknown-unknown-eabi is enough to target thumb2, leaving intrinsics the next missing piece:

I have two answers to this:

Those intrinsics should be available at link time. So this error message is bogus. You can probably try to convince SwiftShader to not check for these errors and emit the object file anyway.
It appears that floating point instructions aren't enabled in this compilation. For example, __aeabi_fmul should be a floating point multiplication instruction, not a library call. (See #2672). You can set the target CPU and target features somewhere here: https://github.com/eliasnaur/swiftshader/blob/master/src/Reactor/LLVMJIT.cpp#L181-L184 (you can find the appropriate values in targets/cortex-m4.json).

eliasnaur commented 2 years ago

It turns out changing the target triple to thumbv7em-unknown-unknown-eabi is enough to target thumb2, leaving intrinsics the next missing piece:

I have two answers to this:

Those intrinsics should be available at link time. So this error message is bogus. You can probably try to convince SwiftShader to not check for these errors and emit the object file anyway.

Right. I had completely forgotten that I worked around similar errors when building for Android targets. Thanks, the warnings are now gone.

It appears that floating point instructions aren't enabled in this compilation. For example, __aeabi_fmul should be a floating point multiplication instruction, not a library call. (See add FPU support #2672). You can set the target CPU and target features somewhere here: https://github.com/eliasnaur/swiftshader/blob/master/src/Reactor/LLVMJIT.cpp#L181-L184 (you can find the appropriate values in targets/cortex-m4.json).

Thanks, I'll adjust the target features if/when I get something running on the device.

With the compute programs now building for thumb2, is there a quick and dirty way to convince TinyGo to include .syso files in my build?

eliasnaur commented 2 years ago

2686 seems to do the trick for me.

aykevl commented 2 years ago

With the compute programs now building for thumb2, is there a quick and dirty way to convince TinyGo to include .syso files in my build?

Actually, there is. I realized that you can simply add them to the linker using #cgo LDFLAGS.

eliasnaur commented 2 years ago

With the compute programs now building for thumb2, is there a quick and dirty way to convince TinyGo to include .syso files in my build?

Actually, there is. I realized that you can simply add them to the linker using #cgo LDFLAGS.

How? I tried

#cgo LDFLAGS: elements_linux_arm.syso

but got

# gioui.org/shader/piet
../gio-shader/piet/elements_abi.go:14:14: invalid flag: elements_linux_arm.syso

aykevl commented 2 years ago

With some testing, I found that the file has to have the .o file extension otherwise you get that error. So use #cgo LDFLAGS: elements_linux_arm.o and rename the file.

tinygo-org / tinygo

add support for .syso files #2670

2686 seems to do the trick for me.