thoughtpolice / buck2-nix

Do not taunt happy fun ball
59 stars 4 forks source link

Remote Execution should be supported (work in progress) #12

Open thoughtpolice opened 1 year ago

thoughtpolice commented 1 year ago

As of sometime late last night, there is experimental support for running a local instance of buildbarn, with the goal of being able to use buck2's remote execution facilities to provide hermetic builds. You can go into the shell environment (i.e. cd into this directory) and type this:

start-buildbarn-vm

This will start a QEMU virtual machine, running NixOS. You can SSH into it:

$ ssh-keygen -f "$HOME/.ssh/known_hosts" -R "[localhost]:2222" && ssh -p 2222 -o "StrictHostKeyChecking no" root@localhost

# Host [localhost]:2222 found: line 6
/home/austin/.ssh/known_hosts updated.
Original contents retained as /home/austin/.ssh/known_hosts.old
Warning: Permanently added '[localhost]:2222' (ED25519) to the list of known hosts.
(root@localhost) Password: 
Last login: Fri Apr 14 05:42:17 2023 from 10.0.2.2

[root@nixos:~]# 

Then, once you're inside, start running buildbarn. This uses docker-compose (which is officially supported by them):

[root@nixos:~]# run-buildbarn

Wait a while. Now, you can visit either http://localhost:7982/ or http://localhost:7984/ to view the scheduler and blob storage web UI for buildbarn, respectively. The local GRPC API for Remote Execution is available at http://127.0.0.1:8980; you can use grpcurl -plaintext to get at it, and this is where we point buck.

Now, set the following to true in .buckconfig, and restart buck. You can leave the API addresses as usual; they aren't used unless re_enabled is set to true:

[buck2_re_client]
re_enabled = true
buck clean; rm -rf buck-out

This will use the GRPC API and BuildBarn containers! You can see the action logs if you look at the output of the run-buildbarn command (which will tail the output of every docker container.) But it fails, currently:

austin@GANON:~/src/buck2-nix.sl$ buck build ...
watchman fresh instance event, clearing cache
Action failed: root//src/renode:hifive-unleashed-vmlinux.bin (download_file https://dl.antmicro.com/projects/renode/hifive-unleashed--vmlinux.elf-s_80421976-46788813c50dc7eb1a1a33c1730ca633616f75f5)
Remote command returned non-zero exit code 127
stdout:
env dump
PWD=/worker/build/3d5ea13ec9268cfc/root
SHLVL=1
_=/usr/bin/env
ok
stderr:
buck-out/v2/gen/root/632fe5438d4aecc1/src/renode/__hifive-unleashed-vmlinux.bin__/download_hifive-unleashed-vmlinux.bin.sh: line 6: curl: command not found
Action failed: prelude//toolchains:nixpkgs-overlay-rust (download_tarball https://github.com/oxalica/rust-overlay/archive/1373567ffd13719f6b7522737b010bfc514d49b4.tar.gz)
Remote command returned non-zero exit code 127
stdout:
stderr:
buck-out/v2/gen/prelude/632fe5438d4aecc1/toolchains/__nixpkgs-overlay-rust__/download_nixpkgs-overlay-rust.sh: line 3: curl: command not found
Action failed: root//src/renode:hifive-unleashed-bbl.bin (download_file https://dl.antmicro.com/projects/renode/hifive-unleashed--bbl.elf-s_17219640-c7e1b920bf81be4062f467d9ecf689dbf7f29c7a)
Remote command returned non-zero exit code 127
stdout:
env dump
PWD=/worker/build/db6d3b8b248de7b5/root
SHLVL=1
_=/usr/bin/env
ok
stderr:
buck-out/v2/gen/root/632fe5438d4aecc1/src/renode/__hifive-unleashed-bbl.bin__/download_hifive-unleashed-bbl.bin.sh: line 6: curl: command not found
Action failed: root//src/renode:hifive-unleashed.dtb (download_file https://dl.antmicro.com/projects/renode/hifive-unleashed--devicetree.dtb-s_10532-70cd4fc9f3b4df929eba6e6f22d02e6ce4c17bd1)
Remote command returned non-zero exit code 127
stdout:
env dump
PWD=/worker/build/ab50ec19dbbe693a/root
SHLVL=1
_=/usr/bin/env
ok
stderr:
buck-out/v2/gen/root/632fe5438d4aecc1/src/renode/__hifive-unleashed.dtb__/download_hifive-unleashed.dtb.sh: line 6: curl: command not found
Build ID: 4da18795-4801-4db3-b09d-53e7bc7259e2
RE: GRPC-SESSION-ID
Jobs completed: 103. Time elapsed: 3.5s. Cache hits: 0%. Commands: 4 (cached: 0, remote: 4, local: 0)
BUILD FAILED
Failed to build 'prelude//toolchains:nixpkgs-overlay-rust (prelude//platform:default#632fe5438d4aecc1)'
Failed to build 'root//src/renode:hifive-unleashed-bbl.bin (prelude//platform:default#632fe5438d4aecc1)'
Failed to build 'root//src/renode:hifive-unleashed-vmlinux.bin (prelude//platform:default#632fe5438d4aecc1)'
Failed to build 'root//src/renode:hifive-unleashed.dtb (prelude//platform:default#632fe5438d4aecc1)'

This is because the built container image doesn't have a default PATH that includes curl or anything like it — you can see this in the above output; a patch I added (but didn't commit) uses env to dump the environment and only PWD and SHLVL are defined. We do need to fix this so a minimal environment works for the bash scripts. Then maybe we can get somewhere, but it will probably hit another several bumps.

thoughtpolice commented 1 year ago

I've just about got this working and wrinkled out most of the immediate bugs including performance, but I think I've now run into buck problems now, which may be expected.

As of a3eb8b70ce6d43a81887b382e575fd3876330f43, if you follow the above steps to start buildbarn, apply the following patch, buck kill and then try something like buck build src/hello:, it will almost work:

diff --git a/.buckconfig b/.buckconfig
--- a/.buckconfig
+++ b/.buckconfig
@@ -20,7 +20,7 @@
 digest_algorithms = SHA1

 [buck2_re_client]
-re_enabled = false
+re_enabled = true
 engine_address = http://127.0.0.1:8980
 action_cache_address = http://127.0.0.1:8980
 cas_address = http://127.0.0.1:8980
diff --git a/buck/prelude/basics/download.bzl b/buck/prelude/basics/download.bzl
--- a/buck/prelude/basics/download.bzl
+++ b/buck/prelude/basics/download.bzl
@@ -18,6 +18,7 @@
         [
             "#!/usr/bin/env bash",
             "set -xeuo pipefail",
+            "source /root/.bashrc",
             "curl -Lo \"$1\" {}".format(ctx.attrs.url),
             "mkdir -p \"$2\"",
             "tar xf \"$1\" -C \"$2\" --no-same-owner --strip-components=1",
@@ -71,6 +72,7 @@
         [
             "#!/usr/bin/env bash",
             "set -xeuo pipefail",
+            "source /root/.bashrc",
             "curl -Lo \"$1\" {}".format(ctx.attrs.url),
             "hash=$(nix hash path --type sha256 \"$1\")",
             "if ! [ \"$hash\" = \"{}\" ]; then".format(ctx.attrs.hash),

But alas:

austin@GANON:~/src/buck2-nix.sl$ buck build src/hello:
File changed: root//buildbarn-vm.qcow2
Action failed: prelude//toolchains:nixpkgs-overlay-rust (download_tarball https://github.com/oxalica/rust-overlay/archive/1373567ffd13719f6b7522737b010bfc514d49b4.tar.gz)
Internal error (stage: download): action_digest=c8b86439a7e637e7c7d98cf2addf516a771ed82f:93: Failed to declare in materializer: status: InvalidArgument, message: "Attempted to read a total of at least 23508808 bytes, while a maximum of 16777216 bytes is permitted", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
stdout:
stderr:
Build ID: 80d8d279-b2f7-4378-a421-2e78d6416456
RE: GRPC-SESSION-ID
Jobs completed: 4. Time elapsed: 1.1s.
BUILD FAILED
Failed to build 'prelude//toolchains:nixpkgs-overlay-rust (prelude//platform:default#632fe5438d4aecc1)'

This is an ICE of sorts; the action cache in buildbarn shows it actually completed correctly and is cached properly, but it just can't be materialized as an output...

thoughtpolice commented 1 year ago

There's another problem: on the concurrent rule for nixpkgs.tar.gz itself, we hit an open file limit at ~10000 files while running tar. This might be a docker or nixos or buildbarn issue:

{
  "result": {
    "outputFiles": [
      {
        "path": "buck-out/v2/gen/prelude/632fe5438d4aecc1/toolchains/__nixpkgs-src__/nixpkgs-src.tar.gz",
        "digest": {
          "hash": "32ef7900bf6bdd31c23769eb5f12cf90d6520129",
          "sizeBytes": "35080457"
        }
      }
    ],
    "outputDirectories": [
      {
        "path": "buck-out/v2/gen/prelude/632fe5438d4aecc1/toolchains/__nixpkgs-src__/nixpkgs-src",
        "treeDigest": {
          "hash": "0640c7392e2cc1234b069a00db6ba51aff0b7c99",
          "sizeBytes": "952521"
        },
        "isTopologicallySorted": true
      }
    ],
    "stderrDigest": {
      "hash": "12132e76cb2b0c53069397c56cbbcc9069944d13",
      "sizeBytes": "2095"
    },
    "executionMetadata": {
      "worker": "{\"datacenter\":\"amsterdam\",\"hostname\":\"ubuntu-worker.example.com\",\"rack\":\"3\",\"slot\":\"10\",\"thread\":\"5\"}",
      "queuedTimestamp": "2023-04-15T16:08:54.839610596Z",
      "workerStartTimestamp": "2023-04-15T16:08:54.851005101Z",
      "workerCompletedTimestamp": "2023-04-15T16:09:12.418768492Z",
      "inputFetchStartTimestamp": "2023-04-15T16:08:54.857069763Z",
      "inputFetchCompletedTimestamp": "2023-04-15T16:08:54.858770100Z",
      "executionStartTimestamp": "2023-04-15T16:08:54.858770100Z",
      "executionCompletedTimestamp": "2023-04-15T16:09:11.005769425Z",
      "virtualExecutionDuration": "16.143940304s",
      "outputUploadStartTimestamp": "2023-04-15T16:09:11.005769425Z",
      "outputUploadCompletedTimestamp": "2023-04-15T16:09:12.418768492Z",
      "auxiliaryMetadata": [
        {
          "@type": "type.googleapis.com/buildbarn.resourceusage.FilePoolResourceUsage",
          "filesCreated": "10000",
          "filesCountPeak": "10000",
          "filesSizeBytesPeak": "69668839",
          "readsCount": "11478",
          "readsSizeBytes": "80058390",
          "writesCount": "24493",
          "writesSizeBytes": "69672827"
        }
      ]
    }
  },
  "status": {
    "code": 3,
    "message": "I/O error while running command: Failed to create new file: File count quota reached"
  },
  "message": "Action details (uncached result): http://localhost:7984/blobs/sha1/historical_execute_response/4ae6ab0f936b905e94f4c89c259b8a4d180afe48-801/"
}
thoughtpolice commented 1 year ago

I was able to work around the materializer failure with 1dbf737cb36e21730b19afe9640fc5f80ff2616e, and with that, we have a single action executing remotely, it looks like! So that's nice...

thoughtpolice commented 1 year ago

Things should be working better as of ebb5efdce856d1da0123537335c0e85b13deb72f with some increases to the FUSE directory limits for buildbarn, but now we're thwarted by https://github.com/facebook/buck2/issues/170#issuecomment-1509924579 which seems to imply that buck2 needs to chunk larger files before uploading; the 160MB from the untar'd nixpkgs tarball is way too much right now to do in one GRPC call. :,)

thoughtpolice commented 1 year ago

The next two holdups are open issues upstream:

There may still be more work TBD. That said, things do seem to be sort of working now to some limited extent.

yaoddao commented 7 months ago

I've just about got this working and wrinkled out most of the immediate bugs including performance, but I think I've now run into buck problems now, which may be expected.

As of a3eb8b7, if you follow the above steps to start buildbarn, apply the following patch, buck kill and then try something like buck build src/hello:, it will almost work:

diff --git a/.buckconfig b/.buckconfig
--- a/.buckconfig
+++ b/.buckconfig
@@ -20,7 +20,7 @@
 digest_algorithms = SHA1

 [buck2_re_client]
-re_enabled = false
+re_enabled = true
 engine_address = http://127.0.0.1:8980
 action_cache_address = http://127.0.0.1:8980
 cas_address = http://127.0.0.1:8980
diff --git a/buck/prelude/basics/download.bzl b/buck/prelude/basics/download.bzl
--- a/buck/prelude/basics/download.bzl
+++ b/buck/prelude/basics/download.bzl
@@ -18,6 +18,7 @@
         [
             "#!/usr/bin/env bash",
             "set -xeuo pipefail",
+            "source /root/.bashrc",
             "curl -Lo \"$1\" {}".format(ctx.attrs.url),
             "mkdir -p \"$2\"",
             "tar xf \"$1\" -C \"$2\" --no-same-owner --strip-components=1",
@@ -71,6 +72,7 @@
         [
             "#!/usr/bin/env bash",
             "set -xeuo pipefail",
+            "source /root/.bashrc",
             "curl -Lo \"$1\" {}".format(ctx.attrs.url),
             "hash=$(nix hash path --type sha256 \"$1\")",
             "if ! [ \"$hash\" = \"{}\" ]; then".format(ctx.attrs.hash),

But alas:

austin@GANON:~/src/buck2-nix.sl$ buck build src/hello:
File changed: root//buildbarn-vm.qcow2
Action failed: prelude//toolchains:nixpkgs-overlay-rust (download_tarball https://github.com/oxalica/rust-overlay/archive/1373567ffd13719f6b7522737b010bfc514d49b4.tar.gz)
Internal error (stage: download): action_digest=c8b86439a7e637e7c7d98cf2addf516a771ed82f:93: Failed to declare in materializer: status: InvalidArgument, message: "Attempted to read a total of at least 23508808 bytes, while a maximum of 16777216 bytes is permitted", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
stdout:
stderr:
Build ID: 80d8d279-b2f7-4378-a421-2e78d6416456
RE: GRPC-SESSION-ID
Jobs completed: 4. Time elapsed: 1.1s.
BUILD FAILED
Failed to build 'prelude//toolchains:nixpkgs-overlay-rust (prelude//platform:default#632fe5438d4aecc1)'

This is an ICE of sorts; the action cache in buildbarn shows it actually completed correctly and is cached properly, but it just can't be materialized as an output...

I've just about got this working and wrinkled out most of the immediate bugs including performance, but I think I've now run into buck problems now, which may be expected.

As of a3eb8b7, if you follow the above steps to start buildbarn, apply the following patch, buck kill and then try something like buck build src/hello:, it will almost work:

diff --git a/.buckconfig b/.buckconfig
--- a/.buckconfig
+++ b/.buckconfig
@@ -20,7 +20,7 @@
 digest_algorithms = SHA1

 [buck2_re_client]
-re_enabled = false
+re_enabled = true
 engine_address = http://127.0.0.1:8980
 action_cache_address = http://127.0.0.1:8980
 cas_address = http://127.0.0.1:8980
diff --git a/buck/prelude/basics/download.bzl b/buck/prelude/basics/download.bzl
--- a/buck/prelude/basics/download.bzl
+++ b/buck/prelude/basics/download.bzl
@@ -18,6 +18,7 @@
         [
             "#!/usr/bin/env bash",
             "set -xeuo pipefail",
+            "source /root/.bashrc",
             "curl -Lo \"$1\" {}".format(ctx.attrs.url),
             "mkdir -p \"$2\"",
             "tar xf \"$1\" -C \"$2\" --no-same-owner --strip-components=1",
@@ -71,6 +72,7 @@
         [
             "#!/usr/bin/env bash",
             "set -xeuo pipefail",
+            "source /root/.bashrc",
             "curl -Lo \"$1\" {}".format(ctx.attrs.url),
             "hash=$(nix hash path --type sha256 \"$1\")",
             "if ! [ \"$hash\" = \"{}\" ]; then".format(ctx.attrs.hash),

But alas:

austin@GANON:~/src/buck2-nix.sl$ buck build src/hello:
File changed: root//buildbarn-vm.qcow2
Action failed: prelude//toolchains:nixpkgs-overlay-rust (download_tarball https://github.com/oxalica/rust-overlay/archive/1373567ffd13719f6b7522737b010bfc514d49b4.tar.gz)
Internal error (stage: download): action_digest=c8b86439a7e637e7c7d98cf2addf516a771ed82f:93: Failed to declare in materializer: status: InvalidArgument, message: "Attempted to read a total of at least 23508808 bytes, while a maximum of 16777216 bytes is permitted", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
stdout:
stderr:
Build ID: 80d8d279-b2f7-4378-a421-2e78d6416456
RE: GRPC-SESSION-ID
Jobs completed: 4. Time elapsed: 1.1s.
BUILD FAILED
Failed to build 'prelude//toolchains:nixpkgs-overlay-rust (prelude//platform:default#632fe5438d4aecc1)'

This is an ICE of sorts; the action cache in buildbarn shows it actually completed correctly and is cached properly, but it just can't be materialized as an output...

Hi @thoughtpolice , When we use Buck2/Buildbarn to remote execution, we met the same issue: build container in Buildbarn does not have a PATH(Maybe there is no value PATH in environment. https://github.com/facebook/buck2/issues/532). So I want to use your patch above. But I have not found file download.bzl in 'buck/prelude/basics/' from Opensource Buck2 codes https://github.com/facebook/buck2/tree/main/prelude. Could you tell me where file download.bzl is in? Because I do not the directory 'basics' under the directory 'prelude'. Thank you very much!