pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.21k stars 620 forks source link

`pantsd` dies when I change a file on Linux but not Mac #20466

Closed yjabri closed 5 months ago

yjabri commented 6 months ago

Describe the bug When editing a file in a repository, pantsd crashes on Linux but not when using the same repository on a Mac

Pants version v2.18.1

OS Are you encountering the bug on MacOS, Linux, or both? Both

Additional info I posed the issue on slack. When running

pants -ldebug run archipelago/src/archipelago/__main__.py

On Linux, I'll make a change to a random file, e.g. capstan/src/capstan/pubsub_client.py, and see

01:15:55.75 [DEBUG] notify invalidating {"capstan/src/capstan/pubsub_client.py"} because of Modify(Data(Any))
01:15:55.75 [INFO] notify invalidation: cleared 1 and dirtied 2 nodes for: {"capstan/src/capstan/pubsub_client.py"}
01:15:55.75 [DEBUG] notify invalidating {"capstan/src/capstan/pubsub_client.py"} because of Modify(Data(Any))
01:15:55.75 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"capstan/src/capstan/pubsub_client.py"}
01:15:55.75 [DEBUG] notify invalidating {"capstan/src/capstan/pubsub_client.py"} because of Modify(Data(Any))
01:15:55.75 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"capstan/src/capstan/pubsub_client.py"}
01:15:55.75 [DEBUG] notify invalidating {"capstan/src/capstan/pubsub_client.py"} because of Modify(Data(Any))
01:15:55.75 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"capstan/src/capstan/pubsub_client.py"}
01:15:55.75 [DEBUG] Dependency DigestFile(capstan/src/capstan/pubsub_client.py) of Some("Snapshot(!*.pyc, !__pycache__/, .gitignore, archipelago/src, archipelago/src/**, capstan/src, capstan/src/**, [more directories], pants.toml, pants.toml/**, [more directories])") changed.
01:15:55.75 [DEBUG] notify invalidating {"capstan/src/capstan", "capstan/src/capstan/pubsub_client.py"} because of Access(Close(Write))
01:15:55.75 [INFO] notify invalidation: cleared 2 and dirtied 199 nodes for: {"capstan/src/capstan", "capstan/src/capstan/pubsub_client.py"}
01:15:55.75 [DEBUG] notify invalidating {"capstan/src/capstan/pubsub_client.py", "capstan/src/capstan"} because of Access(Close(Write))
01:15:55.75 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"capstan/src/capstan/pubsub_client.py", "capstan/src/capstan"}
01:15:55.76 [DEBUG] Dependency Snapshot(!*.pyc, !__pycache__/, .gitignore, archipelago/src, archipelago/src/**, capstan/src, capstan/src/**, [more-directories], pants.toml, pants.toml/**, [more-directories]) of Some("Snapshot") changed.
01:15:55.76 [DEBUG] computed 1 nodes in 0.195103 seconds. there are 3150 total nodes.
01:15:55.76 [ERROR] saw filesystem changes covered by invalidation globs: SnapshotDiff(our_unique_files=(), our_unique_dirs=(), their_unique_files=(), their_unique_dirs=(), changed_files=('capstan/src/capstan/pubsub_client.py',)). terminating the daemon.
01:15:56.43 [ERROR] service failure for <pants.pantsd.service.scheduler_service.SchedulerService object at 0x7fa27716e310>.

On Mac I see

17:19:03.07 [DEBUG] notify invalidating {"capstan/src/capstan/pubsub_client.py"} because of Modify(Data(Content))
17:19:03.07 [DEBUG] notify invalidating {"capstan/src/capstan/pubsub_client.py"} because of Modify(Data(Content))
17:19:03.07 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"capstan/src/capstan/pubsub_client.py"}
17:19:03.07 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"capstan/src/capstan/pubsub_client.py"}

TLDR The issue is because the shebang in ../nce/.../bin/pants is too long (#!/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin/python3.9 -sE) and the -sE gets cut off. This causes scie-pants to pass my .env (PYTHONPATH=...) to python which includes it in sys.path and therefore the invalidation_globs

jsirois commented 6 months ago

@yjabri what's your configured pythonpath, pantsd_invalidation_globs and pants config files (if any)?: https://github.com/pantsbuild/pants/blob/b47af8a60edeb95cf4f81814ef90e27dd7380f61/src/python/pants/option/global_options.py#L2116-L2123

These answers go towards clearing up the Linux logs / situation. I have no explanation yet for the Mac divergence, but it appears you must have a pretty crazy pythonpath configuration that claims ~all your repo source code is composed of Pants plugins (in which case restarting on edit of that plugin code is correct).

yjabri commented 6 months ago

Hi @jsirois!

I don't have any of those set in pants.toml. But I do have a .env file at the root of my directory with a PYTHONPATH containing all of the source roots as described in setting-up-an-ide.

I went ahead an emptied the file and what do you know, it didn't crash!

07:46:07.37 [DEBUG] notify invalidating {"archipelago/src/archipelago/__main__.py"} because of Modify(Data(Any))
07:46:07.37 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"archipelago/src/archipelago/__main__.py"}
07:46:07.37 [DEBUG] notify invalidating {"archipelago/src/archipelago/__main__.py"} because of Modify(Data(Any))
07:46:07.37 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"archipelago/src/archipelago/__main__.py"}
07:46:07.37 [DEBUG] notify invalidating {"archipelago/src/archipelago/__main__.py"} because of Modify(Data(Any))
07:46:07.37 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"archipelago/src/archipelago/__main__.py"}
07:46:07.37 [DEBUG] notify invalidating {"archipelago/src/archipelago/__main__.py"} because of Modify(Data(Any))
07:46:07.37 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"archipelago/src/archipelago/__main__.py"}
07:46:07.37 [DEBUG] notify invalidating {"archipelago/src/archipelago/__main__.py", "archipelago/src/archipelago"} because of Access(Close(Write))
07:46:07.37 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"archipelago/src/archipelago/__main__.py", "archipelago/src/archipelago"}
07:46:07.37 [DEBUG] notify invalidating {"archipelago/src/archipelago/__main__.py", "archipelago/src/archipelago"} because of Access(Close(Write))
07:46:07.37 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"archipelago/src/archipelago/__main__.py", "archipelago/src/archipelago"}

I also changed the file I mentioned in the description of this issue and validated pantsd doesn't error out. Just to double check that nothing else changed, I ran rm -fr .pids/ .pants.d/, undid the change in .env and reproduced the error.

I see that scie-pants has support for .env files which maybe explains why cloning the Pants github repo, checking out v2.18.1 and running the above command with ../pants/pants didn't reproduce the error.

Unfortunately, I'm not sure how to proceed here. I need that .env for the Test Explorer in VS code. Furthermore, some folks on my team prefer to use the default-python venv exported by pants and something like export $(cat .env | xargs).

jsirois commented 6 months ago

Unfortunately, I'm not sure how to proceed here. I need that .env for the Test Explorer in VS code. Furthermore, some folks on my team prefer to use the default-python venv exported by pants and something like export $(cat .env | xargs).

I'm not sure either. I don't use Pants, I just searched the log message in the codebase to root cause.

yjabri commented 5 months ago

I ended up trying to reproduce this on another linux machine but was unable to. I can simplify the setup to a bare bones Pants project with pants.toml

[GLOBAL]
pants_version = "2.18.1"

and a .env file with PYTHONPATH="project/src".

I added some logging in compute_pantsd_invalidation_globs and observed

sys.path = [..., '/home/yjabri/projects/dummy/project/src', ... ]
bootstrap_options.pythonpath = []
bootstrap_options.pythonpath: ['/home/yjabri/projects/dummy/pants.toml']

For whatever reason /home/yjabri/projects/dummy/project/src is being added to my sys.path.

If I add a print(sys.path) to /home/yjabri/.cache/nce/29319df9a6...8b85c015d9/bindings/venvs/2.18.1/bin/pants

import re
import sys
print(sys.path)
from pants.bin.pants_loader import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

I see the path print twice, each containing /home/yjabri/projects/dummy/project/src. If I repeat the same steps on Mac I don't see project/src.

yjabri commented 5 months ago

I should add if I run /home/yjabri/.cache/nce/29319df9a6...8b85c015d9/bindings/venvs/2.18.1/bin/python -c 'import sys; print(sys.path)' I don't see an entry with project/src.

I think this points to scie-pants modifying sys.path in an unexpected way?

Since this machine is running Debian 10, I tried to reproduce the example in a container but no luck.

yjabri commented 5 months ago

Then again, PYTHONPATH=foobar python -c 'import sys; print(sys.path)' will include a path with foobar and scie-pants says it reads .env. It seems like this ought to be the default behavior yet it only happens on specific machines.

yjabri commented 5 months ago

The only way I've been able to reliably reproduce this is to spin up a google compute engine instance with projects/debian-cloud/global/images/family/debian-10. Unfortunately a Debian 10 docker image doesn't reproduce it.

yjabri commented 5 months ago

Observation -

With print(sys.path) in .../bindings/venvs/2.18.1/bin/pants after import sys if I run

SCIE=/home/yjabri/.local/bin/pants SCIE_PANTS_VERSION=0.10.6 PYTHONPATH=foo /home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin/pants --python-repos-find-links=-[] --no-pantsd

I get

['/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin', '/home/yjabri/dummy/foo', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python39.zip', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/lib-dynload', '/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/lib/python3.9/site-packages']

which contains the problematic '/home/yjabri/dummy/foo' path.

If I forgo the shebang and manually run (note the -sE)

SCIE=/home/yjabri/.local/bin/pants SCIE_PANTS_VERSION=0.10.6 PYTHONPATH=foo /home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e7497
9823158b85c015d9/bindings/venvs/2.18.1/bin/python3.9 -sE /home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin/pants --pyth
on-repos-find-links=-[] --no-pantsd

I get

['/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python39.zip', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/lib-dynload', '/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/lib/python3.9/site-packages']

without /home/yjabri/dummy/foo.

In this case the shebang !/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin/python3.9 -sE is 129 characters.

This would explain why I couldn't reproduce this on my mac or another linux machine running via WSL. I checked the linux environment on WSL and the shebang is 123 characters. On my Mac shebang maximum length is 512 (FWIW the shebang is 139 characters)

yjabri commented 5 months ago

If I ln -s /home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/ my_pants

and change the script to

#!/home/yjabri/my_pants/bin/python3.9 -sE
# -*- coding: utf-8 -*-
import re
import sys
print(sys.path)
from pants.bin.pants_loader import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

then I don't get the problematic /home/yjabri/dummy/foo

['/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python39.zip', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/lib-dynload', '/home/yjabri/my_pants/lib/python3.9/site-packages']
['/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin', '/home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python39.zip', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9', '/home/yjabri/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/lib-dynload', '/home/yjabri/my_pants/lib/python3.9/site-packages']
yjabri commented 5 months ago

I was under the impression that including

[pex-cli]
version = "v2.1.159"
known_versions =[
    'v2.1.159|macos_arm64|83c3090938b4d276703864c34ba50bcb3616db0663c54b56dd0521a668d9555f|3671772',
    'v2.1.159|macos_x86_64|83c3090938b4d276703864c34ba50bcb3616db0663c54b56dd0521a668d9555f|3671772',
    'v2.1.159|linux_x86_64|83c3090938b4d276703864c34ba50bcb3616db0663c54b56dd0521a668d9555f|3671772',
    'v2.1.159|linux_arm64|83c3090938b4d276703864c34ba50bcb3616db0663c54b56dd0521a668d9555f|3671772',
]

would somehow include

https://github.com/pantsbuild/pex/pull/2295/files#diff-b1297a55702e854b6233796710a712f1f3fd47905ecc1fabd9f03c63ac980b43R1473-R1487 would resolve this.

and that the generated bin/pants would include the shebang limit fix. Unfortunately that hasn't been the case. FWIW if I manually update the file to

#!/bin/sh
# N.B.: This python script executes via a /bin/sh re-exec as a hack to work around a
# potential maximum shebang length of {max_shebang_length} bytes on this system which
# the python interpreter `exec`ed below would violate.
''''exec /home/yjabri/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.1/bin/python3.9 -sE "$0" "$@"
'''
# -*- coding: utf-8 -*-
import re
import sys
print(sys.path)
from pants.bin.pants_loader import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

it doesn't include the /home/yjabri/dummy/foo path.

yjabri commented 5 months ago

I spent sometime going through the scie-pants repo and familiarizing myself with the pants bootstrap process. This bug has nothing to do with OS, scie-pants, .env files. I was definitely reaching for straws lol.

I believe the root cause is the released Pants Pex that gets bootstrapped is using an older version of Pex. I unzipped the x86 Linux Pex for Pants from https://github.com/pantsbuild/pants/releases/tag/release_2.19.0. The file PEX-INFO contains {"pex_version": "2.1.148"}. (I know the ticket mentions that I'm using 2.18.1 and I referenced 2.19 but I'll happily upgrade)

If it was 2.1.154 or later the above shebang with the ''''exec trick ought to be included.

I'm going to close this issue and ask about why 2.1.148 is being used on the Pants Slack channel.