Closed stuhood closed 2 years ago
It's possible that this would be a good testbed for #14195.
Noticed this again tonight (without it, pantsbuild/pants
CI would complete about 1 minute faster on a warm cache).
It reproduces locally if you restart pantsd
between attempts to run:
./pants test tests/python/pants_test/integration/pep561_integration_test.py
I think that rather than build_runtime_package_dependencies
, this is likely to be the code below it: either src/python/pants/backend/python/util_rules/dists.py
or src/python/pants/backend/python/goals/setup_py.py
. In the absence of #14195, the best bet is probably to run in -ldebug
and compare the input/output digests from the relevant processes.
Running with -ldebug
, and grepping out and diffing the spawning local process
lines, it looks like the culprit is that:
../build_backend.pex_pex_shim.sh backend_shim.py
... has a different input digest in each run. That input was probably the output of some other process (or @rule
) which was non-deterministic (we don't log process results currently: maybe we should).
I can't make sense of inputs to that process being the culprit, but the outputs are almost certainly non-deterministic without extra effort. The sdists and wheels built by that process are being built by known and unknown 3rd parties (setuptools+wheel, poetry, flit, ...). They may respect SOURCE_DATE_EPOCH
, but we should probably be setting it regardless just in case here: https://github.com/pantsbuild/pants/blob/e7c6095829304bd180e922fffeac3f8f0e4650f8/src/python/pants/backend/python/util_rules/dists.py#L212-L215
Yea, let's definitely do SOURCE_DATE_EPOCH
: thanks.
The reason inputs are relevant here though is that even if the backend_shim.py
process is non-deterministic, if we hit the cache for it like we should in this case, the non-determinism would be papered over, and it wouldn't need to re-run. So we have multiple culprits.
I haven't investigated deeply yet, but it looks like
pants.core.goals.test.build_runtime_package_dependencies
might not be deterministic.While inspecting traces of Pants' own CI, I noticed that
pep561_integration_test.py
and its inputs tend to (always?) miss the remote cache, and are the long pole in CI:It's not clear how the non-determinism gets kicked off, but it's unlikely to be a process (at least initially), because hitting the cache for a non-deterministic process is still deterministic. So most likely either a
@rule
or uncacheable process.