Open stuhood opened 3 years ago
Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?: https://github.com/pantsbuild/pants/blob/61193e1bf5c8d9d3b77519626938906a70d5c098/src/python/pants/backend/python/util_rules/pex.py#L679-L682
Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?: ...
This ticket is intended to cover cases where multiple desktop machines are using the same platform: I'll clarify that.
Another cause for cache-misses we have theorized to cause issues are API keys. Some of our tests run against 3rd party systems and authenticated via an API key. With each developer having a unique key we believe this would effectively nullify any benefits of remote caching.
Given the design of the system, the API key is irrelevant and ignoring any security concerns a singular key could be used by all developers. Or hash(api_key)
could safely return a constant.
I did a little bit of thinking about this before we decided to start #13682, so will braindump some of that for now.
I believe that a path forward to allow for "adjusting" the fingerprint that is included in a Process
for certain environment variables and absolute file args is essentially to reify them into types which:
PATH
env var containing your HOME
directory would not include the HOME portion in the Process
' digest, but would use the entire value at execution time.PATH
entry might be constructed by the @rule
implementer by listing which (sub)processes they thought that a Process
would use. Fingerprinting would then collect versions of those processes and apply some rounding to attempt to match.The hope would be that these types would be composable, such that they didn't add much complexity to constructing a Process
.
I enabled remote caching when upgrading to 2.18, and am seeing similar issues -- but even on the same machine by enabling our pants.ci.toml
.
You can see here; that even the fingerprints match, but we get a cache miss when running with only base config:
# WITH pants.ci.toml
11:03:10.24 [DEBUG] remote cache hit for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 }
# WITHOUT pants.ci.toml
11:49:55.05 [DEBUG] remote cache miss for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 }
This is a pex_binary
that we build that only contains two files and two dependencies. We only write the cache from CI, but I assume if I did write locally it'd at least hit it that, but I don't want all users to write to cache to avoid cache pollution. The only thing that could possibly affect this from our config is that we enable pyenv
in pants.ci.toml
-- we do override the default resolve, but I've made sure to include that on the command line.
I've not dug more into it yet, but I'll diff the actual process invocations when I have time and see if I can prove that it's the Python interpreter location that matters.
For completeness; this is the pants.ci.toml
we use:
[GLOBAL]
colors = true
print_stacktrace = true
plugins.add = [
"hdrhistogram",
]
backend_packages.add = [
"pants.backend.python.providers.experimental.pyenv",
]
remote_cache_write = true
[stats]
log = true
[test]
use_coverage = true
[coverage-py]
report = ["json"]
global_report = true
[pytest]
args = ["-vv", "--no-header", "--benchmark-disable"]
[python]
default_resolve = "cpu"
[oci]
rootless = false
uid_map = ["0:0:65536"]
gid_map = ["0:0:65536"]
[pyenv-python-provider]
installation_extra_env_vars = [
"PYTHON_CONFIGURE_OPTS=--with-lto=thin",
"PYTHON_CFLAGS=-march=native -mtune=native",
]
When a remote cache is configured, desktop usage (even among identical platforms) gets lower cache hit rates than CI usage. This is primarily due to inconsistent PATH/env entries between differently configured boxes.
We should:
StreamingWorkunitHandler
to compare cache lookups across different desktop machines)~ (done in #12469)Some of the differences identified for a series of runs across multiple users:
--python-path