pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.19k stars 615 forks source link

Remote-apis testing: pants is failing since 2.16 #20731

Open jjardon opened 3 months ago

jjardon commented 3 months ago

Describe the bug Since 2.16 (commit 27fc9ee7761e61f3c5c9b502d612df5f1f13e29b in https://github.com/pantsbuild/example-python), pants appears to be behave incorrectly when used for remote execution

See https://gitlab.com/remote-apis-testing/remote-apis-testing/-/merge_requests/362

Pants version Issue is present from 2.16 to current one, 2.19 (commit f37c500e4f4e0c67e29aa9434b1b414f333bdd79 in https://github.com/pantsbuild/example-python)

OS Linux

Additional info We have upgraded to latest (2.19) in https://remote-apis-testing.gitlab.io/remote-apis-testing/ to show current status for now, please let us know if this is a known issue and we will upgrade as soon as a new tag is available

huonw commented 2 months ago

Thanks for flagging this.

Is there a summary of how we can run the remote-apis-testing tests locally, to reproduce this? I see that https://gitlab.com/remote-apis-testing/remote-apis-testing/-/blob/master/CONTRIBUTING.md seems to be focused on augmenting the CI pipeline, although peeking at that pipeline does suggest that maybe docker-compose/run.sh is what we should be starting with?

sdclarke commented 2 months ago

Is there a summary of how we can run the remote-apis-testing tests locally, to reproduce this? I see that https://gitlab.com/remote-apis-testing/remote-apis-testing/-/blob/master/CONTRIBUTING.md seems to be focused on augmenting the CI pipeline, although peeking at that pipeline does suggest that maybe docker-compose/run.sh is what we should be starting with?

In order to run a test with pants locally you can cd docker-compose and run ./run.sh -g -c pants.yml -s <server>.yml (the server options being buildbarn, buildfarm and buildgrid). After the first run it is no longer necessary to specify the -g option, as this is used to generate the docker compose yaml files. You can make changes to the generated files if necessary and then re-use the same command above without that flag to test the changes.

I hope this helps!

seifertm commented 2 months ago

The logs of one of the test runs report that the RE server fails to find the Python3.9 executable in /root/.cache/nce/60b51…8fda/….

I see a similar error when testing locally with Nativelink as the underlying REAPI server. It looks like Pants submits an absolute path to the client's home directory to the remote execution server. The server is unable to find that path locally, because it runs on a different machine. The log below shows server output in my test. Pants is run as user michael, but there is no user michael in the executor container.

nativelink_executor-1   |   2024-05-03T04:41:00.194254Z ERROR nativelink_worker::local_worker: Error executing action, err: Error { code: NotFound, messages: ["No such file or directory (os error 2)", "Could not execute command [\"/home/michael/.cache/nce/fa6ec1ff473e58cf7dff9577ae94c2bde6bf1c7a837c75b928b414c0195eb80e/bindings/venvs/2.20.0/bin/python3.9\", \"./pex\", \"--tmpdir\", \".tmp\", \"--no-emit-warnings\", \"--pip-version\", \"23.1.2\", \"--python-path\", \"\", \"--output-file\", \"local_dists.pex\", \"--intransitive\", \"--interpreter-constraint\", \"CPython==3.11.*\", \"--sources-directory=source_files\", \"--no-pypi\", \"--index=https://pypi.org/simple/\", \"--manylinux\", \"manylinux2014\", \"--resolver-version\", \"pip-2020-resolver\", \"--layout\", \"zipapp\"]"] }

I assume something similar is happening in the remote-apis-testing repository, except that the mismatching contents of /root/.cache isn't apparent, because both the local and remote environments are supposedly run as root.

huonw commented 2 months ago

Ah okay, that's a handy smoking gun. Thank you!

It looks like the process invocation is referencing the absolute path to the ~/.cache/nce/.../python3.9 binary that scie-pants provides (the "bootstrap python"). I thus have a suspicion that this might've been caused by #18433 (cherry-picked back to 2.16 in #18495) which switched us to running PEX with that Python interpreter, rather than one more "normally" managed.

@thejcannon (as author) @stuhood (as reviewer): do you have any insight into how we might fix this remote execution issue?

thejcannon commented 2 months ago

The get_python_for_scripts should be computing the path of the unpacked digest in the remote environment (and not the local path). Either that's wrong or remote execution is incorrectly having their code return the local path.

seifertm commented 2 months ago

Thanks for the pointer! The rule you mentioned clearly distinguishes between remote and local environments: https://github.com/pantsbuild/pants/blob/a40f2fdd52542274b716ad7829b597303e030ff6/src/python/pants/core/util_rules/adhoc_binaries.py#L50-L57

When get_python_for_scripts requests _PythonBuildStandaloneBinary in line 55, I expect the engine to run the download_python_binary rule: https://github.com/pantsbuild/pants/blob/a40f2fdd52542274b716ad7829b597303e030ff6/src/python/pants/core/util_rules/adhoc_binaries.py#L60-L68

The Python binary download should emit a log message, but running pants with -ltrace doesn't seem to emit that log message. Consequently, it looks that the EnvironmentTarget passed to get_python_for_scripts or the corresponding if clause has an issue.

Here's a reproducer. Start an REAPI server in one terminal:

git clone https://github.com/TraceMachina/nativelink.git
cd nativelink/deployment-examples/docker-compose
docker compose up -d --build
docker compose logs -f

Start Pants in another terminal:

cat << EOF >pants.toml
[GLOBAL]
pants_version = "2.20.0"
backend_packages = [
    "pants.backend.python",
]
remote_execution = true
remote_store_address = "grpc://127.0.0.1:50051"
remote_execution_address = "grpc://127.0.0.1:50052"
remote_instance_name = "main"
process_execution_remote_parallelism = 1

[python]
interpreter_constraints = ["==3.11.*"]
EOF

cat << EOF >app.py
print("hello")
EOF

cat << EOF >BUILD
python_sources()
EOF

pants -ltrace --no-pantsd --no-local-cache run app.py