pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.35k stars 640 forks source link

`build_runtime_package_dependencies` might not be deterministic #14843

Closed stuhood closed 2 years ago

stuhood commented 2 years ago

I haven't investigated deeply yet, but it looks like pants.core.goals.test.build_runtime_package_dependencies might not be deterministic.

While inspecting traces of Pants' own CI, I noticed that pep561_integration_test.py and its inputs tend to (always?) miss the remote cache, and are the long pole in CI:

pep561-integration

It's not clear how the non-determinism gets kicked off, but it's unlikely to be a process (at least initially), because hitting the cache for a non-deterministic process is still deterministic. So most likely either a @rule or uncacheable process.

stuhood commented 2 years ago

It's possible that this would be a good testbed for #14195.

stuhood commented 2 years ago

Noticed this again tonight (without it, pantsbuild/pants CI would complete about 1 minute faster on a warm cache).

It reproduces locally if you restart pantsd between attempts to run:

./pants test tests/python/pants_test/integration/pep561_integration_test.py

I think that rather than build_runtime_package_dependencies, this is likely to be the code below it: either src/python/pants/backend/python/util_rules/dists.py or src/python/pants/backend/python/goals/setup_py.py. In the absence of #14195, the best bet is probably to run in -ldebug and compare the input/output digests from the relevant processes.

stuhood commented 2 years ago

Running with -ldebug, and grepping out and diffing the spawning local process lines, it looks like the culprit is that:

../build_backend.pex_pex_shim.sh backend_shim.py

... has a different input digest in each run. That input was probably the output of some other process (or @rule) which was non-deterministic (we don't log process results currently: maybe we should).

jsirois commented 2 years ago

I can't make sense of inputs to that process being the culprit, but the outputs are almost certainly non-deterministic without extra effort. The sdists and wheels built by that process are being built by known and unknown 3rd parties (setuptools+wheel, poetry, flit, ...). They may respect SOURCE_DATE_EPOCH, but we should probably be setting it regardless just in case here: https://github.com/pantsbuild/pants/blob/e7c6095829304bd180e922fffeac3f8f0e4650f8/src/python/pants/backend/python/util_rules/dists.py#L212-L215

stuhood commented 2 years ago

Yea, let's definitely do SOURCE_DATE_EPOCH: thanks.

The reason inputs are relevant here though is that even if the backend_shim.py process is non-deterministic, if we hit the cache for it like we should in this case, the non-determinism would be papered over, and it wouldn't need to re-run. So we have multiple culprits.