monogon-dev / monogon

The Monogon Monorepo. May contain traces of peanuts and a ✨pure Go Linux userland✨. Work in progress!
https://monogon.tech
Apache License 2.0
377 stars 9 forks source link

Fix building directly on NixOS #175

Closed leoluk closed 1 year ago

leoluk commented 1 year ago

NixOS does not come with a dynamic linker in a standard location. We mostly do not care because we bring our own rootfs, but we need working Bazel, Go and Protobuf toolchains to build the rootfs in the first place.

(A) Swap out prebuilt binaries for Nix-built binaries

Scope for bootstrapping is limited, so we might be able to get away with host toolchains. All we need is Bazel and a rules_go toolchain to build the pure-Go kubednf binary. And, as it turns out, a Protobuf toolchain, since rules_go depends on bazel_tools, which depends on rules_proto.

Bazel and Go are easy (pin nixpkgs to whatever has the right bazel_5 release, use the host toolchain for Go). Haven't looked at rules_proto yet. We'd somehow have to detect the Nix environment and swap out the toolchains for local ones, which is tricky - repository rules cannot be nested and the env outside of a repository rule is pure. One silly solution involves a repository rule which generates a .bzl file in a repository to be included in WORKSPACE.

(B) Magic FHS environment

Nix has a pkgs.buildFHSUserEnv helper, which creates a unpriv mount namespace that simulates a "normal" FHS environment.

This should just work™ without the extra complexity in (A). The downside is that it is magic and might interfere with Bazel's own sandbox or otherwise result in confusing behavior.

(C) Bring a container

We did this previously, it wasn't ideal but it did work on NixOS.

leoluk commented 1 year ago

As for (B), I tried with this simple flake.nix:

# Monogon's monorepo does not require Nix. This flake is provided as a convenience
# for Nix/NixOS users. Please raise an issue if you encounter any trouble.
{
    description = "Monogon monorepo build environment";

    inputs = { nixpkgs.url = "github:nixos/nixpkgs"; };

    outputs = { self, nixpkgs }:
    let pkgs = nixpkgs.legacyPackages.x86_64-linux;
    in {
      devShell.x86_64-linux = (pkgs.buildFHSUserEnv {
      name = "monogon-bazel";
      targetPkgs = _: [
        pkgs.bazel_5
        pkgs.git
       ];
      runScript = "bash";
    }).env;
    };
}

...but the magic does not permeate Bazel's own sandbox:

_proto:api_proto_proto [for host] failed: (Exit 127): protoc.exe failed: error executing command 
  (cd /home/leoluk/.cache/bazel/_bazel_leoluk/ea09550584ad64933c457c8fa53a71e0/sandbox/linux-sandbox/8/execroot/dev_source_monogon && \
  exec env - \
    PATH=/nix/store/pj1hnyxhcsw1krmhnbb9rjvqssbzliw8-bash-5.2-p15/bin:/nix/store/r1lybmy0jzjsfcmbg24h26aknbgpfvad-coreutils-9.1/bin:/nix/store/mgf57ijj980ycqadf8g6jppj17x976qi-file-5.43/bin:/nix/store/mlvjrnjdjj7p2dnsayy848yk7djgd5xk-findutils-4.9.0/bin:/nix/store/ycw1sij5z1kjcxdclz7gqlnf9hdxchlb-gawk-5.2.1/bin:/nix/store/qrqwd1ji31vmas9gax819j11w5ickgz1-gnugrep-3.7/bin:/nix/store/4cjf65g6bacv9f279j5mv04iapzjx6m6-gnused-4.9/bin:/nix/store/g4qrlj5kr7iwcr81jsm1p55vk0z9rh0z-gnutar-1.34/bin:/nix/store/hcy4vanjxs86cynwn98hcagqyj3ihns4-gzip-1.12/bin:/nix/store/abax98471z8fshv4b9p46bkh3lxmpy0z-python3-3.10.9/bin:/nix/store/2d32x1lxx1vnywscc3jg1nh9pc3yxikr-unzip-6.0/bin:/nix/store/iy8lcy6q7g9mgbnwsa4n7jljx6jk6jk4-which-2.21/bin:/nix/store/krj7lwx6m1y07qc51dbi386falnz628y-zip-3.0/bin \
  bazel-out/host/bin/external/com_google_protobuf_protoc_linux_x86_64/protoc.exe '--proto_path=external/com_github_bazelbuild_buildtools' '--descriptor_set_out=bazel-out/host/bin/external/com_github_bazelbuild_buildtools/api_proto/api_proto_proto-descriptor-set.proto.bin' '-Iapi_proto/api.proto=external/com_github_bazelbuild_buildtools/api_proto/api.proto' --direct_dependencies api_proto/api.proto '--direct_dependencies_violation_msg=%s is imported, but @com_github_bazelbuild_buildtools//api_proto:api_proto_proto doesn'\''t directly depend on a proto_library that '\''srcs'\'' it.' external/com_github_bazelbuild_buildtools/api_proto/api.proto)
# Configuration: 54774918b148f18957f40d35dd29cf183d0f5a10c179de6ba1bf4df61599fa98
# Execution platform: @io_bazel_rules_go//go/toolchain:linux_amd64

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
bazel-out/host/bin/external/com_google_protobuf_protoc_linux_x86_64/protoc.exe: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
Target //third_party/sandboxroot:sandboxroot failed to build
INFO: Elapsed time: 50.652s, Critical Path: 0.15s
INFO: 34 processes: 7 disk cache hit, 27 internal.
MONOGON_SYSROOT_REBUILD=1 bazel --noworkspace_rc --bazelrc /home/leoluk/monogon/tools/../.bazelrc.sandboxroot run //third_party/sandboxroot --sandbox_debug --verbose_failures

This is due to the sandbox clearing LD_LIBRARY_PATH:

bash-5.2$ LD_LIBRARY_PATH=/usr/lib:/usr/lib32 bazel-out/k8-opt-exec-54A0E8D9/bin/external/com_google_protobuf_protoc_linux_x86_64/protoc.exe
Usage: bazel-out/k8-opt-exec-54A0E8D9/bin/external/com_google_protobuf_protoc_linux_x86_64/protoc.exe [OPTION] PROTO_FILES
Parse PROTO_FILES and generate output based on the options given:
  -IPATH, --proto_path=PATH   Specify the directory in which to search for

Looks like --host_action_env=LD_LIBRARY_PATH fixes this.

Next, up... Can't find Java toolchain inside sandbox! Guess why:

monogon-bazel-chrootenv:leoluk@nix2:~/monogon$ cat /nix/store/hwdxkdykzqpnrg1kyq296fbs8sxx8xh7-bazel-rc
startup --server_javabase=/nix/store/8a4ja2ihvgx7xnvgrx9qm404xvbj0n0l-openjdk-headless-11.0.17+8

# Can't use 'common'; https://github.com/bazelbuild/bazel/issues/3054
# Most commands inherit from 'build' anyway.
build --distdir=/nix/store/6n670swmiwyyffcq9n1r4byzb2x5f1sq-bazel-deps
fetch --distdir=/nix/store/6n670swmiwyyffcq9n1r4byzb2x5f1sq-bazel-deps
query --distdir=/nix/store/6n670swmiwyyffcq9n1r4byzb2x5f1sq-bazel-deps

build --extra_toolchains=@bazel_tools//tools/jdk:nonprebuilt_toolchain_definition
build --tool_java_runtime_version=local_jdk_11
build --java_runtime_version=local_jdk_11

# load default location for the system wide configuration
try-import /etc/bazel.bazelrc

(╯°□°)╯︵ ┻━┻)

leoluk commented 1 year ago
# Java toolchain resolution inside the sandbox is broken on NixOS, hardcode it.
# This is a no-op everywhere else since this is what Bazel would select anyways.
# https://github.com/hakuch/nixpkgs/blob/0100c5e564462ca83aed241c58a3427783737a26/pkgs/development/tools/build-managers/bazel/bazel_5/default.nix#L463-L470
build --java_language_version=11
build --java_runtime_version=remotejdk_11
build --extra_toolchains=@bazel_tools//tools/jdk:toolchain_java11_definition

Java is fixed! Onwards:

src/main/tools/linux-sandbox-pid1.cc:304: "mount(/tmp/nix-shell.PT7XD6, /tmp/nix-shell.PT7XD6, nullptr, MS_BIND | MS_REC, nullptr)": No such file or directory

... which is worked around for now by unsetting TMP, TEMP, TMPDIR and TEMPDIR. But now this fails:

monogon-bazel-chrootenv:leoluk@nix2:~/monogon$ bazel test //...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: ec1af4ce-6195-4afd-8628-3ca9722842b1
WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time
INFO: Analyzed 335 targets (4263 packages loaded, 160509 targets configured).
INFO: Found 289 targets and 46 test targets...
WARNING: cleared convenience symlink(s) bazel-bin, bazel-testlogs because their destinations would be ambiguous
ERROR: /home/leoluk/.cache/bazel/_bazel_leoluk/ea09550584ad64933c457c8fa53a71e0/external/gnuefi/BUILD.bazel:3:11: Compiling lib/data.c failed: undeclared inclusion(s) in rule '@gnuefi//:gnuefi':
this rule is missing dependency declarations for the following files included by 'lib/data.c':
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdint.h'
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdarg.h'
ERROR: /home/leoluk/.cache/bazel/_bazel_leoluk/ea09550584ad64933c457c8fa53a71e0/external/gnuefi/BUILD.bazel:3:11: Compiling lib/console.c failed: undeclared inclusion(s) in rule '@gnuefi//:gnuefi':
this rule is missing dependency declarations for the following files included by 'lib/console.c':
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdint.h'
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdarg.h'
ERROR: /home/leoluk/.cache/bazel/_bazel_leoluk/ea09550584ad64933c457c8fa53a71e0/external/gnuefi/BUILD.bazel:3:11: Compiling lib/boxdraw.c failed: undeclared inclusion(s) in rule '@gnuefi//:gnuefi':
this rule is missing dependency declarations for the following files included by 'lib/boxdraw.c':
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdint.h'
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdarg.h'
ERROR: /home/leoluk/.cache/bazel/_bazel_leoluk/ea09550584ad64933c457c8fa53a71e0/external/gnuefi/BUILD.bazel:3:11: Compiling lib/cmdline.c failed: undeclared inclusion(s) in rule '@gnuefi//:gnuefi':
this rule is missing dependency declarations for the following files included by 'lib/cmdline.c':
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdint.h'
  '/nix/store/z6bnqfkj620xmymrhp4xxb0sf3xdc4kw-monogon-bazel-fhs/usr/lib64/clang/15.0.7/include/stdarg.h'
INFO: Elapsed time: 85.168s, Critical Path: 0.65s

Huh! Where is this coming from? CC toolchain resolution is completely disabled.

Also this, WHY:

src/main/tools/linux-sandbox-pid1.cc:487: "execvp(/nix/store/qqa28hmysc23yy081d178jfd9a1yk8aw-bash-5.2-p15/bin/bash, 0x1086d10)": No such file or directory