pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.26k stars 626 forks source link

native options parser mismatch with quoted strings in env vars. #21343

Closed jcrobak closed 2 weeks ago

jcrobak commented 3 weeks ago

Describe the bug

We're using an env var to pass along a value to the ruff CLI. With pants 2.22.0.rc2, I'm seeing a mismatch like:

[WARN] Value mismatch for the option `args` in scope [ruff]:
Legacy value: ['--quiet', '--config', 'lint.isort.known-first-party=["foo","bar"]'] of type <class 'list'>, from source CONFIG
Native value: ['--quiet', '--config', 'lint.isort.known-first-party=[\\"foo\\",\\"bar\\"]'] of type <class 'list'>, from source CONFIG

Here's a minimal repro:

cat << EOF > pants.toml
[GLOBAL]
pants_version="2.22.0.rc2"

backend_packages = [
  "pants.backend.python",
  "pants.backend.experimental.python.lint.ruff.check",
]

[python]
interpreter_constraints = ["CPython==3.12.*"]

[ruff]
args = ["--quiet", "--config", """lint.isort.known-first-party=%(env.FIRST_PARTY)s"""]
EOF

cat << EOF > .pants.bootstrap
# there should be 3 slashes before each quote, so we need to do extra escaping for the heredoc
FIRST_PARTY='[\\\\\"foo\\\\\",\\\\\"bar\\\\\"]'
export FIRST_PARTY
EOF

cat << EOF > main.py
print("hello world")
EOF

cat << EOF > BUILD
python_sources()
EOF

# run pants lint - shows the Value mismatch warning
pants lint ::

Note, this is traced back to some unintuitive behavior in shlex, ref:

$ python
Python 3.12.5 (main, Aug  6 2024, 19:08:49) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import shlex
>>> shlex.split("""lint.isort.known-first-party=[\'foo\',\'bar\']""")
['lint.isort.known-first-party=[foo,bar]']
>>> shlex.split("""lint.isort.known-first-party=[\\\'foo\\\',\\\'bar\\\']""")
["lint.isort.known-first-party=['foo','bar']"]

... and I believe the new behavior is more ergonomic.

Pants version 2.22.0.rc2

OS MacOS

Additional info n/a

benjyw commented 3 weeks ago

Thanks for the detailed report and repro! Technically we should reproduce the old, less ergonomic, behavior, but I am loath to do so because obviously it's ugly. Will think on this.

benjyw commented 3 weeks ago

Looks like this isn't about the env vars per se, but about interpolation. The same issue happens with this in pants.toml:

[GLOBAL]
pants_version="2.22.0.rc2"

[DEFAULT]
FIRST_PARTY="[\\\"foo\\\",\\\"bar\\\"]"

backend_packages = [
  "pants.backend.python",
  "pants.backend.experimental.python.lint.ruff.check",
]

[python]
interpreter_constraints = ["CPython==3.12.*"]

[ruff]
args = ["--quiet", "--config", """lint.isort.known-first-party=%(FIRST_PARTY)s"""]
benjyw commented 3 weeks ago

Hmm, actually in that case it's the native value looks wrong. Will dig deeper.

benjyw commented 3 weeks ago

That was just a printing artifact I think. Debugging this is annoying because of all the different ways that Python and Rust can each display a literal backslash in debug output...

benjyw commented 3 weeks ago

I can make something similar happen without env vars or config or interpolation or shlex:

$ pants   --tag='\\\"'
...
- Value mismatch for the option `tag`:
    Legacy value: ['\\\\\\"'] of type <class 'list'>, from source FLAG
    Native value: ['\\"'] of type <class 'list'>, from source FLAG
...

The commonality seems to be "list of string" typed options. This doesn't happen with a string-typed option.

UPDATE: This is a separate (but similar) issue, which I will address separately.

benjyw commented 2 weeks ago

OK, so in the end this boils down to being because of this eval, which is invoked to parse the stringified list read from config.

This use of eval is an implementation detail of the legacy parser that did not take quoting and other such effects into account. And as @jcrobak has noted, the native parser behavior seems more correct. We almost certainly don't want to emulate this bad behavior in the native parser. For one thing, it's inconsistent (if you try to use the same value as an "implicit append" the eval won't happen). And in any case we can't properly emulate eval in pure Rust (although we could specifically handle quotes I suppose).

So I think the right solution is to detect this specific case, and augment the warning with more details, suggesting that when switching from 2.22.x to 2.23.x you will need to remove the superfluous quotes. That means that the transition is slightly breaking, but hopefully this is not too common a case: it requires interpolation from an env var into a config list.

Does this seem reasonable?

benjyw commented 2 weeks ago

Closed in #21352