Open jsirois opened 2 years ago
The more interesting thing here @stuhood is that a data dependency (on a binary) has an implicit data dependency on the env vars (or at least 1 of them, LD_LIBRARY_PATH
) exposed to the process that found the binary. Without that LD_LIBRARY_PATH
satisfied, the downstream recipient of the found Python binary cannot use it. As things stand, humans carefully or un-knowingly arrange for the implicit dependency to be satisfied unlike most other things engine. Not sure if there is anything clever we can do here. If we engineered a mechanism to propagate env vars in a growing set down to dependent subprocesses, that would be too broad a brush in general although it would prevent this sort of issue in general. Perhaps we're stuck with being careful.
The more interesting thing here @stuhood is that a data dependency (on a binary) has an implicit data dependency on the env vars (or at least 1 of them,
LD_LIBRARY_PATH
) exposed to the process that found the binary. Without thatLD_LIBRARY_PATH
satisfied, the downstream recipient of the found Python binary cannot use it. As things stand, humans carefully or un-knowingly arrange for the implicit dependency to be satisfied unlike most other things engine. Not sure if there is anything clever we can do here.
Yea, this is tricky. It's definitely not implicit that if Process 1 has consumed something from an environment variable, that Process 2 downstream will also need to consume it.
This results in situations where other Pants rules can use a given Python interpreter but the dependency inference rules cannot when those other rules have the benefit of
--subprocess-environment-env-vars='+["LD_LIBRARY_PATH"]'
- for example.
The tack that I have been trying to encourage for the "options which declare environment variables which need to be leaked" pattern is to attempt to make them use-case or tool specific: "all subprocesses" is just too broad.
This ticket represents a really good example of that I think: in this case, the tool that requires a leaked environment variable is the Python interpreter. So as a strawimpl, you could imagine having --python-interpreter-env-vars
, which would be used in any situation where we expected to be able to use (a particular...?) Python interpreter.
And this relates to #12203 as well. Similar to a PATH
entry, the filesystem content of the value behind the env-var value is relevant. So the LD_LIBRARY_PATH
variable would either need to be:
LD_LIBRARY_PATH
in a "particular library version-aware way"... essentially a declaration that no other data behind that env var is relevant)The original problem (with python import parsing) might be obsoleted by the new Rust-based parser https://github.com/pantsbuild/pants/pull/18854, once that fully replaces the Python-based one. Once that's the case, the docs in #18900 referencing this issue could probably be removed too.
That does nothing for the broader problem of propagating env var dependencies, of course.
Would it perhaps make sense to also capture which env vars to leak in the BinaryPath
result?
https://github.com/pantsbuild/pants/blob/a4436e3b427c91c6981a220a20b565b5598744e6/src/python/pants/core/util_rules/system_binaries.py#L46-L48
If the request indicates which env vars are potentially relevant, those could be leaked into the result if defined. Then a call site requesting a search for a Python binary could include LD_LIBRARY_PATH
to be included in the resulting BinaryPath
which may then be used in subsequent process requests for that binary. That way it's configurable per tool which env vars to plumb through for..
I think, for this case, what the root problem is is that plugin authors can ask for a Python executable, and in most (every?) case they shouldn't be using it. With that in mind maybe we make it harder/impossible to do so?
FYI I just fixed this for Django dep inference as well: https://github.com/pantsbuild/pants/pull/19008
The code has never used a
PexProcess
orVenvPexProcess
or manually plumbedSubprocessEnvironment
or else its own bespoke env var leaking option: https://github.com/pantsbuild/pants/blob/37bf6071b9dea6074176cbc67e97cdb70fccb71d/src/python/pants/backend/python/dependency_inference/parse_python_dependencies.py#L88-L106This results in situations where other Pants rules can use a given Python interpreter but the dependency inference rules cannot when those other rules have the benefit of
--subprocess-environment-env-vars='+["LD_LIBRARY_PATH"]'
- for example. This particular example comes up when trying to use GH actions provided Pythons where they are not statically linked Python and use a .so and that .so is not in the default linker cache / linker search path.