thought-machine / please

High-performance extensible build system for reproducible multi-language builds.
https://please.build
Apache License 2.0
2.46k stars 205 forks source link

Remote execution architecture woes #2825

Open peterebden opened 1 year ago

peterebden commented 1 year ago

This is a little bit of a stress test / awkward case, which didn't go so well. Test setup: Machine 1, linux_amd64, running a set of remote execution servers Machine 2, linux_arm64, the client.

Building stuff fails approximately immediately with an exec format error from arcat (which just happens to be the most proximate thing, there's nothing specifically up with it).

Things seem to generally be set up correctly. What happens is:

I'm not really certain what is meant to happen here. There are Platform messages which we should maybe be doing more with - they are currently just a config setting but we should probably automatically attach the arch / ISA which could/would/should at least in this case have the server reject the action saying that it can't fulfil the requested architecture. The arbitrarily nice situation is that Please finds out the server is on a different architecture and swaps itself to match that (essentially pretending to be linux_amd64 and cross-compiling to linux_arm64). I haven't thought much about that but I just bet it's gonna be hard :( I also don't think that the rex protocol has any way for the client to find out about the server this way but it seems reasonable for this to be specified in the config (presumably you know something about the remote execution setup and it's not a total black box).

This might also be fairly doomed for interactions with local = True which would also imply an architecture switch which we're not really set up for. Arguably this would all work more nicely if 'local' and 'remote' were architectures that could be arbitrarily reconfigured.

Tatskaari commented 1 year ago

We have HOST_OS, TARGET_OS and OS. The HOST_OS and TARGET_OS options were useful when trying to cross compile the go_toolchain rule that was depended on as a tool, when we're cross compiling. Perhaps HOST_OS should be set to the OS of the workers when building with rex rather than the current machine. The tools mechanism should use that too.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any recent activity in the past 90 days. It will be closed if no further activity occurs. If you require additional support, please reply to this message. Thank you for your contributions.