pantsbuild / scie-pants

Protects your Pants from the elements.
https://www.pantsbuild.org/docs/installation
Apache License 2.0
19 stars 20 forks source link

`SCIE_BOOT=update pants` results in installation error #263

Open jtilahun opened 1 year ago

jtilahun commented 1 year ago

Attempting to upgrade the pants launcher binary on my computer results in an installation error. Full output and log file can be found below.

jtilahun@JTN86G3:~/devel/monorepo$ SCIE_BOOT=update pants
Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants
bootstrap-tools
pants
pants-debug
update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

ERROR: Failed to expand home dir in path ~/.nce
Install failed: Command '['/home/jtilahun/tools/bin/pants']' returned non-zero exit status 1.
More information can be found in the log at: /home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/bindings/logs/record-scie-pants-info.log

Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants
bootstrap-tools
pants
pants-debug
update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

ERROR: Failed to establish atomic directory /home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/locks/scie-pants-info-6a8ddab0f3a22eb4476a6f164c2267a4abc996c084396b7b224489e041c4e369. Population of work directory failed: Boot binding command failed: exit status: 1

record-scie-pants-info.log

engnatha commented 1 year ago

In testing something unrelated, I was able to reproduce a similar behavior. By running pants in an uninitialized directory and responding n, I got a similar error.

~/devel$ pants
No Pants configuration was found at or above /home/nathanael/devel.
Would you like to configure /home/nathanael/devel as a Pants project? (Y/n): n
Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants
bootstrap-tools
pants
pants-debug
update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

ERROR: Failed to establish atomic directory /home/nathanael/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/locks/configure-38caab2c120194c12f9617ad3a9ed1c094483156c068196f12097cc18bf6ac39. Population of work directory failed: Boot binding command failed: exit status: 1
huonw commented 1 year ago

Hm, the contents of the log isn't very insightful 🤔

2023-08-30 16:30:52,614 ERROR] root: Install failed: Command '['/home/jtilahun/tools/bin/pants']' returned non-zero exit status 1.
More information can be found in the log at: /home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/bindings/logs/record-scie-pants-info.log
Traceback (most recent call last):
  File "/home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/bindings/pex_root/venvs/557963c6782fafb82fc618ff05bb5998dafccd3c/f8df9e2cb55d2d123e1c6f4f3701f3010386f4bb/pex", line 284, in <module>
    sys.exit(func())
  File "/home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/bindings/pex_root/venvs/557963c6782fafb82fc618ff05bb5998dafccd3c/f8df9e2cb55d2d123e1c6f4f3701f3010386f4bb/lib/python3.9/site-packages/conscript/main.py", line 105, in main
    return ep.load()()
  File "/home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/bindings/pex_root/venvs/557963c6782fafb82fc618ff05bb5998dafccd3c/f8df9e2cb55d2d123e1c6f4f3701f3010386f4bb/lib/python3.9/site-packages/scie_pants/record_scie_pants_info.py", line 37, in main
    version = subprocess.run(
  File "/home/jtilahun/.cache/nce/2b6e146234a4ef2a8946081fc3fbfffe0765b80b690425a49ebe40b47c33445b/cpython-3.9.16+20230507-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/home/jtilahun/tools/bin/pants']' returned non-zero exit status 1.

What version of scie-pants are do you have installed? PANTS_BOOTSTRAP_VERSION=report pants

jtilahun commented 1 year ago

Huh, it's unfortunate that the log isn't insightful.

I have scie-pants version 0.10.0 installed.

jtilahun@JTN86G3:~/devel/monorepo$ PANTS_BOOTSTRAP_VERSION=report pants
0.10.0
huonw commented 1 year ago

Thanks @jtilahun ... that's the latest, so there's definitely something to look at here.

Here's some general questions that might narrow things down somewhat, maybe:

  1. what operating system?
  2. how did you install scie-pants?
  3. does it fail if you run it outside your ~/devel/monorepo repo? (i.e. somewhere that doesn't have pants.toml)
  4. if you're on Linux, can you run SCIE_BOOT=update strace -v -e trace=execve -e verbose=execve --follow-forks --string-limit=300 pants 2> strace.log and upload the log? You may need to install https://strace.io (this traces all of the subprocess invocations by recording the execve syscalls, so we can hopefully narrow down exactly which part of the processing fails)
jtilahun commented 1 year ago

Here are answers to those questions:

  1. The operating system is Linux Ubuntu 20.04.
  2. I installed scie-pants by using the get-pants.sh script referenced in the Pants installation documentation (link). I've also attached the exact version of the get-pants.sh script I've been using for completeness: get-pants.sh
  3. Yes, it does fail if I run it outside my ~/devel/monorepo repo. For example, if I run it in ~/tools, which doesn't have pants.toml, it fails:
    
    jtilahun@JTN86G3:~/tools$ SCIE_BOOT=update pants
    Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants bootstrap-tools pants pants-debug update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

ERROR: Failed to expand home dir in path ~/.nce Install failed: Command '['/home/jtilahun/bin/pants']' returned non-zero exit status 1. More information can be found in the log at: /home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/bindings/logs/record-scie-pants-info.log

Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants bootstrap-tools pants pants-debug update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

ERROR: Failed to establish atomic directory /home/jtilahun/.cache/nce/65aa4f2a6c1f9bac672c0df94ae34c7170e5c071cda35e9b725945831905c122/locks/scie-pants-info-a5078db971917d69fb5962395c17cd62b53c7a229697b2227627a3c28242f7d7. Population of work directory failed: Boot binding command failed: exit status: 1

4. I ran `SCIE_BOOT=update strace -v -e trace=execve -e verbose=execve -f --string-limit=300 pants 2> strace.log` in `~/devel/monorepo`. Notice that I replaced `--follow-forks` with `-f` because my `strace` does not recognize the `--follow-forks` option but does recognize the `-f` option. My `man` page for `strace(1)` seems to indicate that it traces child processes created by `fork(2)`:
   -f          Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system calls.  Note that -p PID -f will attach all threads
               of process PID if it is multi-threaded, not only thread with thread_id = PID.

Here's the log that you requested I upload: [strace.log](https://github.com/pantsbuild/scie-pants/files/12533687/strace.log)
huonw commented 1 year ago

Thanks.

It looks like the error is during a recursive invocation, https://github.com/pantsbuild/scie-pants/blob/4df586c25f8698a1734dedf0c8351af249c1f2d3/tools/src/scie_pants/record_scie_pants_info.py#L37-L43 which is invoked by scie / lift https://github.com/pantsbuild/scie-pants/blob/4df586c25f8698a1734dedf0c8351af249c1f2d3/package/scie-pants.toml#L152-L165

The error almost certainly comes from https://github.com/a-scie/jump/blob/b7b1efbc9ca276da759e1b2b74e3ecd7d5bbaffc/jump/src/context.rs#L37-L38. That error occurring suggests https://docs.rs/dirs/5.0.1/dirs/fn.home_dir.html is returning None, which seems like it can only happen in limited conditions on Linux: $HOME is none and getpwuid_r doesn't return useful info.

The strace log explicitly shows HOME=/home/jtilahun in the first two execve calls, but not in the third one, which is the one that fails. That last one is just (env vars are the third parameter):

[pid 90871] execve("/home/jtilahun/bin/pants", ["/home/jtilahun/bin/pants"], ["PANTS_BOOTSTRAP_VERSION=report"]) = 0

This aligns with the env={...} parameter in record_scie_pants_info.py, and suggest a fix would be ensuring that call inherits os.environ too: env={**os.environ, "PANTS_BOOTSTRAP_VERSION": "report"} to ensure that HOME is set.

@jtilahun do you feel like submitting a pull request with that change?


It's a bit weird to me that this is the first observation of this failure, with SCIE_BOOT=update pants working on other systems (e.g. my mac). My theory is that getpwuid_r usually works (so things have been working fine without HOME set), but @jtilahun's user account is configured in a way that doesn't work as smoothly with getpwuid_r? I don't eprsonally know enough about Linux user management to know where to start there, though!


@engnatha I think that's a separate issue, which I filed as https://github.com/pantsbuild/scie-pants/issues/266. Thanks for flagging.

jtilahun commented 1 year ago

Hmm, there's something going on that I haven't grasped quite yet.

I tried isolating this to a minimum reproducible example of dirs::home_dir failing. Here's what I have:

src/main.rs

fn main() {
    match dirs::home_dir() {
        Some(path) => println!("Your home directory, probably: {}", path.display()),
        None => println!("Impossible to get your home dir!"),
}
}

Cargo.toml

[package]
name = "monorepo"
version = "0.1.0"
edition = "2021"

[[bin]]
edition = "2021"
name = "main"
path = "src/main.rs"

[dependencies]
dirs = "4.0"

I built the binary with cargo build --bin main. Manually tinkering with $HOME, I've convinced myself that in the absence of $HOME, dirs::home_dir goes somewhere else to find my home directory and does so successfully:

jtilahun@JTN86G3:~/devel/monorepo/target/debug$ ./main
Your home directory, probably: /home/jtilahun
jtilahun@JTN86G3:~/devel/monorepo/target/debug$ HOME="" ./main
Your home directory, probably: /home/jtilahun
jtilahun@JTN86G3:~/devel/monorepo/target/debug$ HOME=" " ./main
Your home directory, probably:  
jtilahun@JTN86G3:~/devel/monorepo/target/debug$ HOME="not_a_real_home_directory" ./main
Your home directory, probably: not_a_real_home_directory

I don't feel like I understand what's happening. I don't want to submit a pull request until I feel like I have a better understanding of what's happening.

huonw commented 1 year ago

Yes, I agree with investigating more given my theory doesn't seem to hold. Thanks for checking!

What happens if you run it without any env vars at all: env -i ./main?

jtilahun commented 1 year ago

If I run it without any env vars at all, it's still able to find my home directory:

jtilahun@JTN86G3:~/devel/monorepo/target/debug$ env -i ./main
Your home directory, probably: /home/jtilahun
huonw commented 1 year ago

Hm, I note that you've set dirs = "4.0" there, but scie-pants uses 5.0.1. It doesn't look like there's significant changes between the versions, but there's a chance that might be the difference... could you try with a newer dirs and dirs-sys?

https://github.com/pantsbuild/scie-pants/blob/4df586c25f8698a1734dedf0c8351af249c1f2d3/Cargo.lock#L184-L195

jtilahun commented 1 year ago

I tried with a newer dirs and dirs-sys, but no difference.

jtilahun@JTN86G3:~/devel/monorepo/target/debug$ env -i ./main
Your home directory, probably: /home/jtilahun

Here's my Cargo.toml file now:

[package]
name = "monorepo"
version = "0.1.0"
edition = "2021"

[[bin]]
edition = "2021"
name = "main"
path = "src/main.rs"

[dependencies]
dirs = "5.0"

Note that scie-pants sets "dirs = 5.0" in Cargo.toml: https://github.com/pantsbuild/scie-pants/blob/4df586c25f8698a1734dedf0c8351af249c1f2d3/Cargo.toml#L29

Here's my Cargo.lock file for sanity checking: Cargo.lock Note that my example package uses "5.0.1".

jtilahun commented 1 year ago

I haven't known where ~/.nce comes from, given that no such file or directory exists for me:

jtilahun@JTN86G3:~$ ls ~/.nce
ls: cannot access '/home/jtilahun/.nce': No such file or directory

Searching the repo, I found the one result here: https://github.com/a-scie/jump/blob/71d2a9d9f7f197cf185fd48426e46ee026fb4587/jump/src/context.rs#L241. Reading the surrounding code, it looks as if it's trying to set up a context of some sort. To get the base directory, it first checks SCIE_BASE, followed by a couple of other places. If it still can't find a directory, then it defaults to ~/.nce for whatever reason.

So I wondered what would happen if I were to set SCIE_BASE to "~/.nce" on my own. Surprisingly, doing so results in different behavior. It creates a directory at the path "~/.nce" on my behalf and also appears to download some archive. After a few seconds, it finally errors out. Screen recording attached.

https://github.com/pantsbuild/scie-pants/assets/26139374/e727e0ac-c1e2-4e78-b246-e62e473b4a26

So I'm thinking that there's some funny business going on with the directory handling logic. I still haven't pinpointed exactly what it is, but something smells fishy.

cognifloyd commented 7 months ago

I just ran into this issue on an ubuntu laptop. The relevant bit in the strace log shows

[pid 3062956] execve("/home/jafloyd/.local/bin/pants", ["/home/jafloyd/.local/bin/pants"], ["PANTS_BOOTSTRAP_VERSION=report"]) = 0
Error: Failed to expand home dir in path ~/.nce 

My user account comes from active directory via sssd on the laptop, so it is not present in /etc/passwd and similar files. I also have some sss_overrides configured so that my uid/gid/home_dir and other user settings are sane (not a uid in the billions, and a much more concise home directory).

I can reproduce the SCIE_BOOT=update error more simply by doing this (I'm using bash as my shell here):

$ unset HOME
$ PANTS_BOOTSTRAP_VERSION=report pants
Error: Failed to expand home dir in path ~/.nce

Isolates your Pants from the elements.

Please select from the following boot commands:

<default> (when SCIE_BOOT is not set in the environment)  Detects the current Pants installation and launches it.
bootstrap-tools                                           Introspection tools for the Pants bootstrap process.
update                                                    Update scie-pants.

You can select a boot command by setting the SCIE_BOOT environment variable.

So, @jtilahun, you tested your mini rust program with HOME="" and HOME=" ", but did you try running your mini program once HOME is not set?

edit: Oh. I see you used env -i to try that. I get the same failure if I do that. I also get it for any other use of the scie-pants binary (I just manually downloaded/updated to 0.11.0.

$ env -i PANTS_BOOTSTRAP_VERSION=report ~/.local/bin/pants
Error: Failed to expand home dir in path ~/.nce
[snip]

$ env -i ~/.local/bin/pants version
Error: Failed to expand home dir in path ~/.nce
[snip]